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An array of problems 


Political interference in the selection process for the headquarters of the Square Kilometre Array 


should not go unchallenged. 


hen David Cameron addressed Australia’s Parliament last 
W reeens the British prime minister referred briefly to the 

Square Kilometre Array (SKA), “the world’s largest radio 
telescope”. The project’s headquarters, he noted, were in Manchester, 
UK. Not so. The location of the SKA headquarters — a political and 
scientific prize — was due to be decided last week. It was a two-horse 
race: the United Kingdom or Italy. But the date came and went with no 
news. Astronomers have been left scratching their heads. 

Nature has seen internal documents that explain both the delay and 
Cameron's expectancy. Italy won, and the United Kingdom kicked up 
a fuss. It threatened to pull out. It implied that Italy could not be relied 
on. It demanded (and will get) a recount. It acted, in other words, asa 
playground bully. Science has never been immune to the ugly reality of 
politics, but last week’s unseemly gamesmanship is a particularly sorry 
example, and one that should not be allowed to stand unchallenged. 

It is true that the Jodrell Bank Observatory near Manchester has 
acted as a temporary base for the SKA since 2012, and that the British 
would like that to continue. But the merits of two possible sites for a 
permanent home — the other is a historic observatory in Padua — 
have been the subject of an admirably transparent selection procedure, 
which the United Kingdom is now trying to undermine. 

The two bids were assessed on a precise set of criteria, including 
political commitment to provide the extra financial support expected 
ofa host and the quality of the research environment. The SKA board 
agreed ona timetable and chose a selection panel to assess the bids and 
recommend the winner. The panel comprised SKA board directors 
from three of the organization’s 11 member countries — Australia, 
South Africa and the Netherlands — and a representative from the 
European Southern Observatory (ESO), an international astronomy 
organization headquartered in Garching, Germany. The panel's 
recommendation to the SKA board last week was crystal clear: both 
locations satisfied the criteria, but Padua was the better option. 

When the United Kingdom saw that it had not won, it tried to 
change the rules, ramping up the pressure by circulating government- 
level letters to SKA board members. One from the head of the UK 
Science and Technology Facilities Council says “the [panel's] report 
does not appear to properly account for the scale and approval status 
of our financial commitment’, and that “any decision on the head- 
quarters must consider the broader ways this will affect the project 
and in particular the way in which it could affect the level of political 
commitment to the project”. Another (unsigned) letter from the UK 
Department for Business Innovation and Skills says: “All things being 
equal — which they are in terms of meeting the HQ criteria — it makes 
no sense to dramatically increase the risk of the project by chang- 
ing leadership from the UK to Italy ... Transferring leadership would 
require the UK to radically re-assess participation in the project.” 

The SKA aims to use the globe as a giant radio telescope to image 
the early Universe, when the first stars and galaxies were forming. 


With a project this ambitious, it is perhaps not surprising that fights 
for control get dirty. In March 2012, South Africa was judged slightly 
better as a site to host the SKA telescope than Australia, but a politi- 
cal storm led to the decision to share the instruments. South Africa 
is building 3,000 dishes, and an even larger number of antennas are 
being installed in Australia. 

Any competition for hosting the headquarters would have been 
undertaken in the knowledge that a non-UK winner would require a 
physical move. Italy may have a reputation among tabloid newspaper 

readers as Europe's clown — thanks in part 


“Science has to years under former prime minister Silvio 
never been Berlusconi — but hard-nosed scientists look- 
immune to the ing at its reliability in international scientific 
ugly reality of projects do not need to stoop to stereotypes. 


Italy is a reliable partner in both CERN, 
Europe's particle-physics lab near Geneva, 
Switzerland, and the ESO, for example. The country has competently 
headed organizations including the International Centre for Theoreti- 
cal Physics and the International Centre for Genetic Engineering and 
Biotechnology for decades without problems. 

Under pressure from the United Kingdom, the SKA board deferred 
a vote on the headquarters site to its next meeting at the end of April. 
It gave both countries until 20 March to submit extra material to the 
selection panel to confirm financial support, including their commit- 
ments if they are unsuccessful, and to address vague “operational and 
schedule matters; and organisational and reputational matters”. The 
board also asked the panel to provide it with a comparative analysis 
“without an overall recommendation” by 10 April. These new criteria 
represent a move away from a transparent selection process to one that 
is based on murkier ground. = 


politics.” 


Allin good time 


Stratigraphers have yet to decide whether the 
Anthropocene is anew unit of geological time. 


rounding landscape to offer a clear view across the city. Known in 

German as Teufelsberg, the tree-covered hill looks primeval, but it 
was not there until 70 years ago. It was constructed as a dump for more 
than 25 million cubic metres of rubble cleared from the streets after 
the Second World War. So it is fitting that this artificial hill had a visit 
last year from a group of researchers assessing the geological imprint 
of humans on the planet. 


lE western Berlin, Devil’s Mountain rises 80 metres above the sur- 
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The Anthropocene Working Group has a simple name but a very 
complicated job. These are the people who have to work out whether the 
world has entered a new slice of geological time — the Anthropocene. 

As the group continues to assess the evidence, the rest of the planet 
has apparently made its decision. Three journals have been launched 
that are dedicated to research on the Anthropocene. Environmental 
advocates have heartily adopted the term and all it signifies, and so have 
many others, including artists and social scientists. And four years ago, 
Nature recommended that geologists formally accept the Anthropo- 
cene, arguing that the term “provides a powerful framework for consid- 
ering global change and how to manage it” (see Nature 473, 254; 2011). 

But although many people have already made up their minds, 
those whose opinions matter the most have yet to do so (see pages 
144 and 171). 

The Anthropocene working group is diverse: about half of the 
three-dozen researchers are geologists, the rest a mix of archaeolo- 
gists, palaeontologists, climate experts, atmospheric scientists and 
representatives of other disciplines. Working without pay over the 
past six years, and communicating mostly by e-mail, they have been 
sifting through evidence and arguments about when the Anthropo- 
cene might have begun, what kind of geological markers might define 
it, and whether it is worthy of recognition as a separate unit in Earth’s 
geological history. 

Despite the popular appeal of the Anthropocene, decisions relating 
to the geological timescale must rest with stratigraphers — research- 
ers who study the evidence embedded in rock, ocean sediments, ice 
cores and other geological deposits. These people must look past the 
clamour and decide whether the Anthropocene is an appropriate new 
unit of chronostratigraphy. Their proposal will then be voted on by the 
International Commission on Stratigraphy (ICS) and the International 
Union of Geological Sciences. 

The process remains conservative because the timescale is a tool 
used by tens of thousands of geoscientists around the world. Changes 
can create confusion, so the ICS requires strong scientific justifica- 
tion for any amendments. The fundamental question for the working 
group and for the ICS is whether geologists would find it sufficiently 
useful to define an Anthropocene unit in the rock record, which is 


the physical manifestation of the timescale. The Anthropocene would 
probably be an epoch that would sit after the Holocene, which started 
with the end of the last ice age, around 11,700 years ago. 

If the Anthropocene is under way, then when did it start? Initial 
suggestions focused on the Industrial Revolution, but momentum has 
picked up to set the boundary after the Second World War. Since then, 
the global population has increased by 180%, water use by 215% and 

energy consumption by 375%. Researchers 


“Stratigraphers — have called this surge the Great Accelera- 
must be given tion, and it has skewed the composition of 
time and space the atmosphere, warmed the planet, eroded 
to consider the the ozone layer and acidified the oceans. 
consequences “The last 60 years have without doubt seen 
of ‘formally the most profound transformation of the 
adopting the human relationship with the natural world 


in the history of humankind,’ says the Inter- 
national Geosphere-Biosphere Programme, 
which has charted those changes. 

It seems obvious that such broad planetary upheavals would warrant 
recognition on the geological timescale. But they may not be ade- 
quately reflected in stratigraphic evidence. In many parts of the globe, 
the geological record of the past 65 years is thin to non-existent. In the 
deep sea, less than a millimetre of sediment has built up, and that could 
be erased as ocean acidity increases. Signs of atmospheric changes are 
also preserved in recently laid down glacial ice, but much of that record 
could disappear in coming centuries as a result of global warming. 

The working group still faces a considerable amount of work to eval- 
uate whether — and how — to define the Anthropocene. If the commit- 
tee or upper levels of the geology hierarchy decide against amending 
the timescale, the Anthropocene will not disappear. Many scientific 
disciplines and the public will continue to use the concept and word, 
in much the same way as they use the terms Neolithic era or Stone Age. 

In the meantime, it is important that stratigraphers be given time 
and space to consider the consequences of formally adopting the 
Anthropocene. Any such change cannot be revisited for at least a dec- 
ade, so the geological community will have to live with its decision for 
some time to come. m 


Anthropocene.” 


In the beginning 


As the first true science journal marks 350 years, 
we must defend scholarly pursuits. 


first and longest-running scientific journal, Philosophical 

Transactions: Giving Some Accompt of the Present Under- 
takings, Studies, and Labours of the Ingenious in Many Considerable 
Parts of the World. 

The first volume appeared on 6 March 1665, as a personal project 
of Henry Oldenburg, the first Secretary of the Royal Society in 
London, and was more of what many would regard as a magazine 
— with letters, book reviews and accounts of experiments from 
Europe’s growing cadre of natural philosophers. Almost a century 
was to elapse before the Royal Society officially took it over and 
Phil. Trans. began to take its modern shape. 

Part magazine and part journal, Phil. Trans. was much more than 
either. It was the journal — a genuinely new innovation — in which 
people of inquiring minds started to throw off the shackles of ancient 
received opinion and ask their own questions about the world around 
them. It was the start of scientific enquiry as we know it today. 

By 1887, the breadth of scholarship had grown so much that 
Phil. Trans. could not encompass it all in one place. It split into 


Ts month marks the 350th anniversary of arguably the 
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streams — A and B — to cover separately the mathematical and 
physical sciences, and the biological sciences. 

The schism was a sign of things to come. Today there are more than 
40,000 scientific journals, from the hieratic to the demotic, the parochial 
to the cosmogonic. The arrival of electronic media is precipitating the 
biggest change in publishing since the invention of printing: journals 
are moving online, and access to knowledge, once the privilege of the 
educated European gentleman, is now increasingly seen as the right of 
any and every person — and rightly so. It would be all too easy to say 
that the only way now is onwards and upwards, as the bright light of 
enlightenment evaporates an ever-shrinking puddle of unreason. 

Three and a half centuries of progress might seem a lot, but it is 
a tiny mote in the piebald passage of human history. Hard fought 
for, broad support for scholarly pursuit of a better world cannot be 
taken for granted. 

The Library of Alexandria in Egypt was targeted and destroyed at 
various times between 48 Bc and ap 642. For those inclined to dismiss 
such wanton vandalism as ancient history, think of the continuing 
and concerted efforts by many in the United States and elsewhere 
to sweep away science ranging from climate-change research to 
evolution. Consider that, as you read this, Islamist extremists are 
bulldozing the remains of ancient Assyria. 

Even amid an almost uncountable profusion 
of journals, Phil. Trans. continues to thrive. All 
curious minds should wish it another 350 golden 
years. But the forces of irrationality are gaining in 
strength — one cannot afford to be complacent. m 
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RICHARD A. ECKHARDT 


WORLD VIEW .jennisicor sen 


Inhofe (Republican, Oklahoma) last month theatrically tossed 
a snowball onto the floor of the Senate during a debate on global 
warming. Despite all the talk of record temperatures in 2014, he said, 
there was snow on the lawn of the Capitol in Washington DC in winter. 

Inhofe may or may not be aware of the distinction between weather 
and climate. Either way, he is unlikely to alter his views on climate 
change. More important is how such messages are received by the 
public, and in particular by the millions of schoolchildren who will be 
wrestling with the problem of global warming long after Inhofe is gone. 

The United States has an opportunity to hugely improve the way 
that Earth sciences are taught in its schools. The difference between 
weather and climate, for example, could become standard discussion 
for third-grade classes, when children are eight or 
nine years old. Powerful lobby groups are trying 
to derail this opportunity. All scientists should 
help to stop them. 

The quality of Earth-science education in most 
US schools is abysmal. I say this as a former high- 
school Earth-science teacher. Unlike physics, 
chemistry and biology, Earth science is typically 
taught by those with no adequate training in the 
subject. 

In 2013, new standards were released that could 
reinvigorate US science education. Called the 
Next Generation Science Standards (NGSS), they 
were developed by scientists, science-education 
researchers and state-education representatives. 
In the NGSS, Earth science is on an equal footing 
with life science and physical science, from kindergarten through to 
the 12th grade (age 17 or 18). High-school students would learn how to 
“use a model to describe how variations in the flow of energy into and 
out of Earth’s systems result in changes in climate”. Imagine how well 
prepared the general public would be for making decisions about, and 
planning for, the impacts of climate change, nuclear waste disposal and 
investments in energy resources if they could “analyze geoscience data 
to make the claim that one change to Earth's surface can create feedbacks 
that cause changes to other Earth systems”. 

One truly exciting possibility about these standards relates to how 
they might be assessed. The testing movement has taken hold of US 
public education. Many state tests are predominantly multiple-choice, 
driving down the quality of classroom practice to memorization of 
facts and cookery-book laboratories. The NGSS will require new ways 
to assess both knowledge and scientific thinking. From a teacher's per- 
spective, this provides an opportunity to teach 


E: another embarrassing moment for US scientists, Senator James 


science well and to engage students in the pro- DNATURE.COM 
cess of science, knowing that the assessments will _ Discuss this article 
challenge students to think rather than recall. online at: 
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THE QUALITY OF 
EARTH- 
SCIENCE 


EDUCATION IN MOST 
US SCHOOLS IS 


ABYSMAL. 


Help to fight the battle 
for Earth in US schools 


Scientists everywhere must champion a set of US education standards 
that promote Earth sciences, argues Nicole D. LaDue. 


Washington DC moved beyond third-grade comprehension of daily 
weather versus average climate and focused on the complex economic 
impacts of climate change that we are already experiencing. This is 
possible if we put our efforts into adopting and implementing the NGSS 
appropriately across the country. 

Under the US constitution, the federal government cannot tell states 
what to teach in schools. Each state must choose to adopt the NGSS, 
through approval by their boards of education or senate. 

Currently, 13 states and Washington DC have adopted the NGSS; this 
covers about 14.5 million of the 50 million or so US students. However, 
state-level politics have blocked adoption in many cases. The National 
Center for Science Education, established to fight those who challenge 
the teaching of evolution and climate science across the United States, 
has been monitoring bills and lawsuits associated 
with the NGSS. In Kansas, there was a lawsuit 
over adoption of the standards because teaching 
evolution and the Big Bang was said to promote 
atheistic viewpoints. In Wyoming, Michigan and 
West Virginia, adoption has been challenged over 
the inclusion of anthropogenic climate change. 

Even in states that have adopted the NGSS, 
hurdles remain. Many districts are looking to 
infuse the Earth-science content into physics, 
chemistry and biology classes, rather than estab- 
lish high-quality Earth-science courses. This 
decision benefits the district because those classes 
prepare students for college-level courses, boosting 
national rankings. The teachers of physics, chem- 
istry and biology are often unprepared for teach- 
ing the Earth-science content. If the NGSS is to succeed, science teachers 
must be trained on the content and develop or adjust the curriculum. 

When scientists learn that my research focus is geoscience educa- 
tion, they lament the state of science literacy in the world around them. 
Certainly, if you are a US scientist, you probably feel that. But please do 
not tell me about your child’s poorly prepared science teacher. Do not 
tell me that your undergraduate students are ill-prepared for college- 
level science. 

Instead, tell me that you have asked your local school board how 
they are implementing the NGSS. Tell me that you have offered to run 
a workshop to teach your local teachers the Earth-science content they 
need. Tell me that your university department has written a letter to 
your state legislature on the importance of implementing Earth science 
in the NGSS to create a scientifically literate public prepared to make 
important decisions and pursue careers in high demand in your region. 
Stop complaining and do something. m 


Nicole D. LaDue works in the Department of Geology and 
Environmental Geosciences at Northern Illinois University in DeKalb. 
e-mail: nladue@niu.edu 


12 MARCH 2015 | VOL 519 | NATURE | 131 


© 2015 Macmillan Publishers Limited. All rights reserved 


ESEARCH HIGHLIGHTS 


Global warming 
could speed up 


The rate of global warming 
could more than double over 
the coming decades, as green- 
house gases build up in Earth’s 
atmosphere. 

Steven Smith and his 
colleagues at the Pacific 
Northwest National 
Laboratory in College Park, 
Maryland, analysed the rate 
of warming in global climate 
simulations, and compared 
them over different 40-year 
periods. The team found that 
the global rate of warming in 
these simulations increases 
to an average of 0.25°C per 
decade by 2020. An analysis 
of palaeoclimate data rarely 
showed rates of temperature 
change above 0.1°C per 
decade during the last 
millennium. 

The Arctic, Europe and 
North America will probably 
see larger increases in 
warming rates than the global 
average. 

Nature Climate Change 
http://dx.doi.org/10.1038/ 
nclimate2552 (2015) 


PALAEONTOLOGY 


Oldest Homo 
fossil found 


A 2.8-million-year-old jaw- 
bone from Ethiopia may 
represent the earliest fossil 
from the genus Homo yet dis- 
covered — pushing back the 
known origins of humankind 
by nearly 500,000 years. 
The fossil (pictured), 
analysed by Brian 
Villmoare at the 
University of Nevada, 
Las Vegas, William 
Kimbel at Arizona 
State University 
in Tempe and 
their colleagues, 
has key features 


Post-menopausal whales lead the hunt 


After they reach menopause, female killer 
whales help their kin to survive by sharing their 
hunting expertise. 

Humans, killer whales (Orcinus orca; 
pictured) and one other whale species are 
the only animals whose females are known to 
experience a long post-reproductive life. Female 
orcas can live into their 90s, even though they 
stop reproducing in their 40s. Darren Croft 
at the University of Exeter, UK, and his team 
analysed more than 750 hours of video footage 


of killer whales off the US Pacific coast collected 
between 2001 and 2009. Observations of 102 
different whales up to 91 years old showed that 
post-reproductive females tended to lead group 
hunts for salmon, an important source of food. 
This leadership was particularly pronounced in 
years when salmon were scarce. 

This is the first direct evidence that post- 
menopausal females are a source of ecological 
know-how, the authors say. 

Curr. Biol. http://doi.org/2mx (2015) 


of Homo, such as the parabolic 
shape of the jaw. But it also 
has more primitive traits, such 
as the jaw’s overall 
size, that are seen in 
Australopithecus 
afarensis, a human 
ancestor that lived 
around 3 to 4 million 
years ago. 
The fossil could belong 
to an ancestral Homo 
species, the authors say, 
filling a gap in the human 
fossil record. 
Science http://dx.doi. 
org/10.1126/science.aaa1343 
(2015) 
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ASTRONOMY 


Quadruple images 
of supernova 


A rare configuration of cosmic 
objects has produced multiple 
images of an exploding star 
in the same frame. If more 
images of the supernova 
appear, the system could 
provide a new way to measure 
the Universe's growth rate. 
Patrick Kelly at the 
University of California, 
Berkeley, and his colleagues 
discovered the supernova 
kaleidoscope when examining 
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images from the Hubble Space 
Telescope. 

The multiple images 
occurred because two giant 
objects, a galaxy cluster and 
a galaxy within that cluster, 
acted as cosmic magnifying 
lenses that bent and boosted 
the light from the distant 
supernova. Light rays taking 
different paths around the 
gravitational lenses created 
the four different images. 
These rays took different 
amounts of time to travel their 
respective paths. Measuring 
such differences could help 
astronomers to better estimate 
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distances in space and to 
measure the expansion of the 
Universe. 

Science 347, 1123-1126 (2015) 


Ultra small 
bacteria spotted 


Bacteria roughly 1/100th the 
volume of a typical Escherichia 
coli have been found in 
groundwater. 

Jillian Banfield at the 
University of California, 
Berkeley, Luis Comolli of 
Lawrence Berkeley National 
Laboratory in California, 
and their colleagues filtered 
groundwater through a 
mesh with holes around 
0.2 micrometres in diameter 
and collected a variety of 
extremely small bacteria 
(around 0.009 cubic 
micrometres) that have 
never been cultured. Under 
the electron microscope, 
the microbes seemed to 
have tightly packed DNA, 
few of the protein-making 
structures called ribosomes, 
and structures that might 
allow the cells to connect 
and communicate with one 
another. 

The researchers suggest that 
these bacteria had not been 
cultured before because they 
depend on other microbes to 
grow. 

Nature Commun. 6, 6372 (2015) 


PHOTONIC MATERIALS 


Pulled fibres 
shift colour 


Rubbery fibres have been 
developed that reversibly 
change colour when stretched 
or bent. 

Xuemei Sun, Huisheng 
Peng and their collaborators at 
Fudan University in Shanghai, 
China, attached microscopic 
plastic spheres to elastic 
fibres that were wound with 
carbon nanotubes. As the fibre 
stretches, the spaces between 
the microspheres increase in 
size along the length of the 
fibre, whereas they decrease 
in the radial direction. This 
changes the wavelengths of 


light that are reflected by the 
fibres, resulting in shifts in 
colour between red, green and 
blue as the fibre is stretched and 
released. The fibres remained 
stable after 1,000 rounds of 
stretching and were woven into 
fabric in various patterns. 

Such ‘mechanochromic’ 
materials could be used in 
wearable displays or sensors, 
the authors say. 

Angewandte Chemie http://doi. 
org/f259np (2015) 


GEOLOGY 


Hydration lifts 
Earth’s crust 


The high elevation of parts of 
the western United States could 
be a result of water percolating 
up from deep in Earth’s crust, 
and changing the crust’s 
mineral composition, making 
the rocks more buoyant. 

Geologists have been 
hard-pressed to explain 
why Colorado and much of 
Wyoming have lifted by more 
than 2 kilometres over the past 
75 million years. A team led by 
Craig Jones at the University of 
Colorado Boulder reanalysed 
data on the geology and 
seismology of the region and 
conclude that in lower regions, 
such as Montana, fragments 
of crustal rock contain dense 
minerals such as garnet. 
Beneath high-elevation areas, 
however, the rocks contain 
a different suite of less dense 
minerals. The authors suggest 
that these were produced by 
water reacting with the dense 
minerals and so making the 
crust lighter. 

The water may have come 
from the dehydration of a 
deeply buried, ancient crustal 
slab. 

Geology http://doi.org/2ps 
(2015) 


STRUCTURAL BIOLOGY 


X-rays reveal 
virus innards 


With the help of powerful 
X-rays, researchers have 
determined the three- 
dimensional structure of a 
single giant virus particle. This 
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SOCIAL SELECTIO 


Popular articles 
on social media 


Scientific art kicks off Twitter storm 


Images of painted pterosaurs, ceramic diatoms and quilts 
depicting neurons flooded scientists’ Twitter feeds, after the 
writers of Symbiartic, Scientific American’s art blog, launched 
SciArt Week on 1 March. Researchers and artists posted a 
flurry of artwork highlighting the beautiful side of science, 


using the hashtag #sciart. 


Malcolm Campbell, a plant scientist at the University 
of Toronto, Scarborough, Canada, was one of the first 
researchers to announce SciArt week on 
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shows how tiny objects that 
cannot be easily crystallized 
can still be imaged in 3D. 
X-ray crystallography is 
commonly used to work out 
the structure of molecules, but 


these must be crystallized first. 


However, free-electron lasers 
generate such high-energy 
X-ray pulses that they can, in 
theory, produce pictures of 
just a single molecule. 

Tomas Ekeberg at Uppsala 
University in Sweden and 
his colleagues fired these 
lasers at single particles of 
the Acanthamoeba polyphaga 
mimivirus. They used 
algorithms to combine X-ray 
diffraction patterns from 
many specimens and created 
a 125-nanometre-resolution 
image of the virus (pictured). 

The results confirm that 
that the mimivirus is less 
densely packed with genetic 
material than smaller viruses 
tend to be. 


Phys. Rev. Lett. 114,098102 
(2015) 


Twitter. “Art captures the imagination in 
a way that science alone cannot,’ he says. 
“It’s a wonderful way to make science 
more tangible to the public? 


CLIMATE-CHANGE BIOLOGY 


Insects feast 
under high CO, 


Leaf-eating insects in northern 
temperate forests consume 
more of the forest canopy 
when carbon dioxide levels 

are increased, which could 
limit forests’ capacity to act 

as carbon sinks in a warming 
world. 

John Couture and his 
colleagues at the University of 
Wisconsin-Madison, found 
that in parts ofa research 
forest exposed to raised CO, 
levels, herbivorous insects 
increased their consumption 
of foliage by 88%. This led 
to an average of 70 grams of 
carbon-sequestering biomass 
lost per square metre of forest 
per year. 

Increased CO, could be 
causing this effect by changing 
the nutrient content of leaves 
and also by boosting the 
number of leaf-eating insects, 
the authors say. They also 
suggest that insect behaviour 
should be incorporated into 
models that estimate the 
effects of high CO, on forest 
productivity. 

Nature Plants http://dx.doi. 
org/10.1038/nplants.2015.16 
(2015) 
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SEVEN DAYS nescnnss 


EVENTS 


Dawn arrival 

NASAs Dawn spacecraft 
slipped into the gravitational 
pull of Ceres on 6 March, 
making it the first probe to 
visit a dwarf planet. At nearly 
1,000 kilometres across, Ceres, 
located in the asteroid belt, is 
one of the largest unexplored 
worlds in the Solar System. 
Dawn will orbit Ceres for the 
next 15 months, gathering 
information about the large 
amounts of water thought to 
lurk within the asteroid. The 
craft also visited the asteroid 
Vesta in 2011-12; its arrival 

at Ceres also makes Dawn the 
first probe to have orbited two 
celestial bodies. See go.nature. 
com/uwg9fb for more. 


Animal research 
More than 120 research 
institutes, organizations and 
societies in Europe called on 
the European Commission 
on 4 March to oppose 

an initiative calling fora 
complete ban on research 
using animals. Animal- 
rights activists submitted a 
petition to the commission 
on 3 March signed by more 
than 1.1 million citizens. As 
part of a European Citizens’ 
Initiative, the petition opens a 
procedure for a hearing in the 
European Parliament, and for 
reconsideration of legislation. 
In a joint statement, the 
bodies supported the current 
legislation, saying that it 
guarantees high standards of 
animal welfare while allowing 
crucial health research. 


Tardy, weak El Niiio 
A weak El Nifio pattern 

has developed several 

months later than normal 

in the equatorial Pacific 
Ocean, forecasters with the 
US National Oceanic and 
Atmospheric Administration 
(NOAA) announced on 

5 March. Marked by warmer 


Ivory stockpile burns in Kenya 


Fifteen tonnes of ivory were burned in Nairobi 
National Park on 3 March, as Kenya became the 
latest country to destroy seized stocks to deter 
elephant poachers. At the burn, President Uhuru 
Kenyatta said that the country would soon 


than average waters, El Nifio 
conditions can have far- 
flung consequences, from 
greater precipitation in the 
southeastern United States to 
droughts in southeast Asia. 
NOAA says that there is a 
50-60% chance that El Nifio 
conditions will continue into 
the Northern Hemisphere 
summer, but that the system 
is too weak and too late to 
have major global impacts. 
See go.nature.com/qbmdci for 
more. 


i as 
ITER head 


Bernard Bigot was appointed 
director-general of ITER, a 
project to build the world’s 
biggest nuclear-fusion 
reactor in southern France, 
at a meeting in Paris on 

5 March. Bigot, who retired 
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as chairman of the French 
Alternative Energies and 
Atomic Energy Commission 
(CEA) in January, was 
nominated for the ITER post 
last November (see Nature 
http://doi.org/2q3; 2014). 

He has promised reforms of 
ITER’s complex multinational 
management, and to address 
the project’s schedule 
slippages and cost increases. 
Bigot begins his five-year term 
immediately. 


Cancer chief 

The director of the US 
National Cancer Institute 
(NCI), Harold Varmus, 
announced on 4 March that he 
will step down after five years 
in the post. Varmus will leave 
the centre, part of the National 
Institutes of Health (NIH), 

at the end of the month, and 
plans to open a lab at the 
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destroy the rest of its stockpile, too. The country’s 
previous president burned around 5 tonnes of 
ivory in 2011. China said last month that it would 
ban all imports of ivory, as poaching continues to 
kill hundreds of elephants in Africa every week. 


Weill Cornell Medical College 
in New York City. He was 
director of the NIH from 1993 
to 1999, and won the 1989 
Nobel Prize in Physiology or 
Medicine for his work on the 
role of retroviruses in cancer. 
NCI deputy director Douglas 
Lowy will serve as interim 
chief until a replacement is 
appointed. See go.nature.com/ 
sv4ful for more. 


Cancer-drug firm 
Pharmaceutical firm AbbVie 
agreed to pay US$21 billion 
to purchase Pharmacyclics, 

a company that specializes 

in cancer drugs, in a deal 
announced on 4 March. 
Pharmacyclics, based in 
Sunnyvale, California, makes 
Imbruvica (ibrutinib), a 
blood-cancer drug that targets 
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a protein called Bruton’s 
tyrosine kinase, and which 
brought in $548 million 

in 2014. AbbVie, of North 
Chicago, Illinois, plans to 
close the deal in the middle 
of 2015. 


‘Biosimilar’ drug 
The US Food and Drug 
Administration awarded its 
first approval to a ‘biosimilar’ 
drug on 6 March. The drug, 
Zarxio (filgrastim-sndz), 

is similar to a previously 
approved protein used to 
prevent infections following 
cancer chemotherapy. 

The Zarxio decision could 
herald the approval of other 
biosimilars, and reduce health- 
care costs. Zarxio, made by 
the generics arm of the Swiss 
pharmaceutical firm Novartis, 
was approved in Europe in 
2009, but the United States 
has struggled to formulate 
regulations governing 
biosimilars. See go.nature. 
com/omxrup for more. 


Solar plane 


Swiss pilots launched an 
attempt on 9 March to fly 
around the world in a plane 
powered only by solar energy. 
Bertrand Piccard and André 
Borschberg began their trip 
in the experimental plane 
Solar Impulse 2 (pictured) in 
Abu Dhabi. The plane, which 
has a wingspan wider than 


TREND WATCH 


The costs to Africa of adapting 

to climate change could rise 

to between US$50 billion and 
$100 billion per year by 2050, 
depending on global efforts to 
reduce greenhouse-gas emissions, 
the United Nations Environment 
Programme reported on 4 March. 
It estimates that current annual 
financial aid is just $1 billion to 
$2 billion. Levies on sectors such 
as tourism and banking could 
raise $4.8 billion per year, but even 
if current policies keep warming 
to below 2°C, costs could still 
outpace revenue as early as 2020. 


that of a jumbo jet but is the Research Infrastructure 
weight of a small car, uses Strategy, employing some 
more than 17,000 solar cells 1,700 research staff, will close 
and rechargeable lithium-ion if funding does not come 
batteries to fly for several through. The cash is tied to 
days and nights in a row. The controversial legislation on 
five-month trip will include higher-education reform that 
passing over both the Atlantic has not yet passed through 
and Pacific oceans. parliament. See go.nature. 
com/3q8eiq for more. 
Pe FUNDING 
Australian crisis ; ; 
Much of Australia’s Brain project 
shared national research The European Commission 
infrastructure is under has recommended changes 
threat of closure because to the governance of Europe's 
of uncertainty over €1-billion (US$1.1-billion) 
whether it will receive Human Brain Project, which 
the Aus$150 million brings together neuroscience 
(US$116 million) allocated and computing. A summary 
by the government last year. report published by the 
Organizations representing commission on 6 March states 
Australian scientists wrote that the decision-making 
an open letter to Australia’s processes need to be made 
Prime Minister Tony “simple, fair and transparent”. 
Abbott on 4 March warning Similar recommendations 
of the crisis. Twenty- were made on 9 March by 
seven facilities under the an independent mediation 
National Collaborative committee that was analysing 


CLIMATE-CHANGE COSTS 


Adapting to the effects of climate change will cost Africa dearly, 
even if the world acts on warming. 


125 ..... = Warming above 4°C 
= Warming below 2°C 
— Medium funding-levy scenario 
100 «: -= Average climate funding for 2010-12 «yf 


Adaptation cost (US$ billion*) 


*2012 constant 


SEVEN DAYS | THIS WEEK | 


14-18 MARCH 

The decadal UN World 
Conference on Disaster 
Risk Reduction will take 
place in Sendai, Japan. 

It aims to help countries 
prepare for disasters. 
go.nature.com/opisic 


15 MARCH 

NASA%s Super Pressure 
Balloon is scheduled to 
launch after this date 
from Wanaka, New 
Zealand. The research 
balloon aims to break 
the previous record of 
54 days in flight, and 
will also test technology 
developed by NASA 
over 15 years. 


17-21 MARCH 

The UN World 
Conference on Tobacco 
or Health takes place in 
Abu Dhabi. The event, 
which occurs every 
three years, will focus 
on the link between 
tobacco use and non- 
communicable diseases 
that kill 38 million 
people each year. 
go.nature.com/smiiek 


deep rifts in the project. See 
go.nature.com/knoaqq for 
more. 


EU funding scrutiny 
Bulgaria has agreed to have 

its deficient research system 
scrutinized by a group of 
international science-policy 
experts on behalf of the 
European Commission. The 
review, scheduled to begin in 
April, will be the first carried 
out under the auspices of 

the commission's Policy 
Support Facility, a €20-million 
(US$22-million) programme 
launched on 3 March with the 
goal of strengthening science 
and innovation capacities in 
the European Union. 
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Health workers swab a pigeon in a market in Changsha, China. 


Flu genomes trace 
H7N9 evolution 


But surveillance of avian influenza viruses is patchy and slow. 


BY DECLAN BUTLER 


o one knows the pandemic potential 
| \ | of the H7N9 avian influenza that has 
infected more than 560 people in China 
and killed 204 since it was first detected in 
March 2013. But the largest-ever genomic sur- 
vey of the virus in poultry now provides a more 
detailed picture of its evolution and spread. 
Such information can help to target control 
efforts, and to monitor the evolution of the virus. 
But an analysis of sequences submitted to the 
GenBank repository in the past 15 years suggests 
that genetic surveillance of avian flu viruses in 
birds is patchy and less than prompt. 
For now, H7N9 flu does not spread eas- 
ily among people. But as with many bird-flu 


viruses, a concern is that it could evolve to do so. 

Ina paper published on Nature's website this 
week (T. T.-Y. Lam et al. Nature http://dx.doi. 
org/10.1038/nature14348; 2015), an inter- 
national team of researchers describes how it 
tracked the virus from October 2013 to July 
2014 by taking swabs from poultry at live-bird 
markets in 15 cities over 5 provinces in eastern 
China. The group detected the virus in markets 
in seven cities and in 3% of samples on average. 

The team then sequenced the genomes 
of 438 viral isolates and found that as the 
virus spread south, it evolved into three main 
branches, with multiple sub-branches. 

Such diversification is expected, but tracking 
it can help to identify the main trade routes and 
markets that fuel a virus’s spread. “The extent of 


viral transmission among chickens was largely 
unclear until our paper showed that the virus 
had diverged into regional lineages,’ says Yi 
Guan, a co-author of the paper and a virologist 
at the State Key Laboratory of Emerging Infec- 
tious Diseases in Shenzhen, China. “Eastern 
China remains as a reservoir and ‘distribution 
centre’ for this virus,” he says. 

Despite such insights, relatively few 
sequences of H7N9 have been collected. 
Sequences from only eight H7N9 viral iso- 
lates collected from birds in 2014 have been 
deposited in GenBank. That is not enough for 
geographical mapping of the virus over time, 
says Marius Gilbert, an avian-flu epidemiolo- 
gist and ecologist at the French-speaking Free 
University of Brussels. 

Nor is the latest paper up to date. A new win- 
ter wave of H7N9 is under way, and probably 
has different patterns of spread. 

In 2012, Nature’s news team reported that 
genetic surveillance of animal-flu viruses is 
patchy globally: most genomes are sequenced 
months or years after collection (see Nature 483, 
520-522; 2012). Current GenBank data suggest 
that this is still true. Far more flu sequences are 
being deposited in GenBank, but many are from 
samples collected some time ago. 

Guan agrees that timely monitoring is impor- 
tant. But surveillance and viral sequencing are 
costly and time-consuming, and for H7N9 
require access to a biosafety-level-3 lab. Given 
the complications, Guan thinks that the number 
of recent H7N9 sequences is not grossly low. 

Adding to the time lag, public authorities and 
researchers who sequence flu strains sometimes 
make the data public only when, or if, they pub- 
lish — so sequences can languish. The authors 
of the latest study have sent sequences to Gen- 
Bank and had already shared the data with the 
World Health Organization and other bodies. 

Guan and his co-authors warn that H7N9 
“should be considered as a major candidate to 
emerge as a pandemic strain”. But predicting 
pandemic potential is an embryonic science. 
Last year, a prominent international group of 
researchers argued that there is little evidence 
that flu viruses that cause sporadic human infec- 
tions are a greater pandemic threat than viruses 
that have not yet infected humans (C. A. Russell 
et al. eLife 3, e€03883; 2014). But Guan says that 
given the vast number of flu viruses, it is neces- 
sary to prioritize targets for control and vaccine 
development — and that H7N9 should be high 
on that list. m 
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Mistrust and meddling 
unsettles US science agenc 


National Science Foundation under pressure from lawmakers to revise its agenda. 


BY BOER DENG 


he US National Science Foundation 

| (NSF) has had a tough couple of years. 

Republicans in the US Congress have 

put the agency under the microscope, ques- 

tioning its decisions on individual grants 

and the purpose of entire fields of study. The 

agency was without a permanent director fora 

year, and it is now planning an expensive, and 
controversial, move to new headquarters. 

As she prepares to mark one year at the 
agency’s helm, astrophysicist France Cordova 
is carefully navigating these challenges. “I used 
to be a mountaineer,” she says. “It’s all about 
looking at every move and how you can best 
do it so that you don’t take a fall” But many 
researchers worry that Congress has begun 
to interfere with the scientific process. As 
mistrust grows, the NSF is caught between 
the scientists it serves and the lawmakers it 
answers to. 

Cordova has moved aggressively to repair 
relations with Congress. Aides to lawmakers 
who participated in a December trip to NSF 
facilities in Antarctica say that the journey 
was successful. And to address concerns about 
transparency, the agency has instituted guide- 
lines that should make its grant summaries 
easier to understand. 

But such efforts seem to have had little influ- 
ence on an investigation of the NSF’s funding 
decisions by Representative Lamar Smith 
(Republican, Texas), chairman of the House 
Committee on Science, Space, and Technol- 
ogy. Since he took the job two years ago, Smith 
has sought to root out what he sees as waste- 
ful spending by the US$7-billion NSE. He has 
introduced legislation that would require the 
agency to certify that every grant it awards is 
in the “national interest’, and he has repeatedly 
sought, and been given, confidential informa- 
tion about individual NSF grants — albeit 
in redacted form. On at least four occasions, 
staff from the science committee travelled to 
the NSF's headquarters in Arlington, Virginia, 
to review such documents, most recently on 
28 January. 

“There is a sense of exhaustion among 
researchers as this has continued,” says 
Meghan McCabe, a legislative-affairs analyst 
at the Federation of American Societies for 
Experimental Biology in Bethesda, Maryland. 
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National Science Foundation head France Cordova (left) is trying to improve lawmakers’ view of the agency. 


An NSF programme director who asked not 
to be named is more direct: “Having them in 
our building questioning our work like that felt 
like an attack” 

But Cordova argues that the political land- 
scape has changed and the NSF must adapt. 
“Congress absolutely has the right to request 
whatever materials for oversight they want,’ she 
says. “Just because we're not used to it doesn't 
mean it’ a violation.” At a House subcommittee 
hearing in February, Cordova told lawmakers 
that she supports Smith’s proposal to require 
that NSF grants support the national interest. 
(The NSF already judges grant applications on 
their potential “broader impacts” as well as on 
scientific merit; in December, it began asking 
applicants to articulate how their projects serve 
the national interest, as defined by the agency's 
mission statement.) 

Among scientists, however, there is anxi- 
ety that Cordova has been too conciliatory 
towards critics in Congress. “Once you start to 
compromise, youre just inviting harassment,” 
says Lloyd Etheredge, a social scientist at the 
Policy Sciences Center in Bethesda. 
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Some argue that the concessions to Congress 
will compromise the agency's peer-review pro- 
cess. “If the NSF is funding a grant, it should 
by definition be in the national interest,” says 
John Bruer, president emeritus of the James S. 
McDonnell Foundation in St Louis, Missouri, 
who in 2011 led an NSF task force on grant cri- 
teria. “When you add stuff about the national 
interest, you are potentially inviting criteria 
apart from judging the best science.” 


MANY PRIORITIES 

The agency’s ongoing struggle with Congress 
has left Cordova with less time to deal with 
internal challenges, such as employees who are 
disgruntled by a 2013 decision to move NSF's 
headquarters froma suburb close to Washing- 
ton DC to a site that is farther away and has 
smaller facilities for some staff. In October, a 
federal government arbitrator sided with an 
NSF employee union and ordered the agency 
to revise its design to accommodate large, indi- 
vidual workspaces in the new headquarters. 
Cordova has sought to address unrest about the 
move througha series of meetings and working 
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groups, but rumours persist that many senior 
employees will opt to retire rather than relocate. 

That would be a significant blow to an 
agency that is already stretched. The NSF's 
budget has grown slowly but steadily in recent 
years, reaching $7.3 billion in fiscal year 2015. 
But even though the number of grant proposals 
submitted to the agency has risen by 65% over 
the past 15 years, the NSF has seen only a 20% 
increase in the number of full-time employees. 


The resulting increase in workload has 
affected staff morale. A 2014 survey by the 
US Office of Personnel Management found 
that only 45% of NSF employees felt that the 
agency's leadership generated “high levels of 
motivation and commitment in the work- 
force’, compared with 53% in 2010. And just 
over one-third of workers were negative about 
the opportunities available for getting a better 
job at the agency. 


IN FOCUS | NEWS 


As Cordova enters the second year of her 
six-year term, the challenges ahead are clear. 

Eugene Skolnikoff, a political scientist at 
the Massachusetts Institute of Technology in 
Cambridge, says that winning and maintain- 
ing the trust of the scientific community gives 
an NSF director clear authority to negotiate 
with Congress. “The best NSF directors,” he 
says, “have been the ones who really got the 
staff and the scientists behind their vision.” m 
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DNA clock proves tough to set 


Geneticists meet to work out why the rate of mutation in the human genome is hard to pin down. 


BY EWEN CALLAWAY 


athematicians keep refining m even 
Mie they know it to more than 

12 trillion digits; physicists beat 
themselves up because they cannot pin down 
the gravitational constant beyond three signifi- 
cant figures. Geneticists, by contrast, are hav- 
ing trouble deciding between one measure of 
how fast human DNA mutates and another 
that is half that rate. 

The rate is key to calibrating the ‘molecular 
clock’ that puts DNA-based dates on events in 
evolutionary history. So at an intimate meet- 
ing in Leipzig, Germany, on 25-27 February, a 
dozen speakers puzzled over why calculations 
of the rate at which sequence changes pop up 
in human DNA have been so much lower in 
recent years than previously. They also pon- 
dered why the rate seems to fluctuate over 
time. The meeting drew not only evolutionary 
geneticists, but also researchers with an inter- 
est in cancer and reproductive biology — fields 
in which mutations have a central role. 

“Mutation is ultimately the source of all her- 
itable diseases and all biological adaptations, 
so understanding the rate at which mutations 
evolve is a fundamental question,” says Molly 
Przeworski, a population geneticist at Colum- 
bia University in New York City who attended 
the Human Mutation Rate Meeting. 

Researchers tried to put a number on 
the human mutation rate even before they 


> 
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knew that genetic information is encoded 
in DNA. In the 1930s, pioneering geneticist 
J. B. S. Haldane came up with a good estimate 
by measuring how the mutations responsible 
for haemophilia appeared in extended families. 

Later estimates of the mutation rate counted 
the differences between stretches of DNA and 
protein amino-acid sequences in humans 
and those in chimpanzees or other apes, and 
then divided the number of differences by the 
time that has elapsed since the species’ most 
recent common ancestor appeared in the fos- 
sil record. These esti- 


mates were clouded “The fact that 

by the patchiness of the clockis so 

the fossil record, but uncertainis very 
researchers eventu- problematic 

ally settled on a con- for us.” 


sensus: each DNA 

letter, on average, mutates once every billion 
years. That is a “suspiciously round number’, 
molecular anthropologist Linda Vigilant of 
the Max Planck Institute for Evolutionary 
Anthropology in Leipzig told Nature in 2012 
(see Nature 489, 343-344; 2012). 

In the past six years, more-direct meas- 
urements using ‘next-generation DNA 
sequencing have come up with quite different 
estimates. A number of studies have compared 
entire genomes of parents and their children — 
and calculated a mutation rate that consistently 
comes to about half that of the last-common- 
ancestor method. 


A slower molecular clock worked well to 
harmonize genetic and archaeological estimates 
for dates of key events in human evolution, such 
as migrations out of Africa and around the rest 
of the world’. But calculations using the slow 
clock gave nonsensical results when extended 
further back in time — positing, for example, 
that the most recent common ancestor of apes 
and monkeys could have encountered dino- 
saurs. Reluctant to abandon the older num- 
bers completely, many researchers have started 
hedging their bets in papers, presenting multi- 
ple dates for evolutionary events depending on 
whether mutation is assumed to be fast, slow or 
somewhere in between. 

Last year, population geneticist David 
Reich of Harvard Medical School in Boston, 
Massachusetts, and his colleagues compared 
the genome of a 45,000-year-old human from 
Siberia with genomes of modern humans and 
came up with the lower mutation rate’. Yet just 
before the Leipzig meeting, which Reich co- 
organized with Kay Priifer of the Max Planck 
Institute for Evolutionary Anthropology, his 
team published a preprint article’ that calcu- 
lated an intermediate mutation rate by look- 
ing at differences between paired stretches of 
chromosomes in modern individuals (which, 
like two separate individuals’ DNA, must 
ultimately trace back to a common ancestor). 
Reich is at a loss to explain the discrepancy. 
“The fact that the clock is so uncertain is very 
problematic for us,” he says. “It means that the 
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> dates we get out of genetics are really 
quite embarrassingly bad and uncertain.” 
Reich hoped that even if the meeting did 
not reach a consensus on mutation rate, it 
would highlight the research that is needed 
to move forward. He and Priifer kicked off 
the meeting by polling attendees on their 
favoured rate, and found that the lower 
figure had gained popularity, but there was 
still a wide spread of opinions. 
Increasingly, Reich and others conclude 
that the human mutation rate has fluctu- 
ated over millions of years. Much of the 
discussion at the meeting revolved around 
when it accelerated and decelerated — and 
why. Evolutionary changes in metabolism 
or reproductive biology are both possible 
causes. Aylwyn Scally, a population geneti- 
cist at the University of Cambridge, UK, 
thinks that the common ancestor of great 
apes, which lived between 20 million and 
12 million years ago, had longer generations 
than its relatives on the monkey branch of 
the primate family tree. That would have 
slowed mutation: a longer generation would 
lead to fewer mutations per year, on average. 
Medical-minded geneticists also fret 
about mutation rates. Meeting attendee 
Michael Stratton, director of the Wellcome 
Trust Sanger Institute in Hinxton, UK, is 
a cancer geneticist who studies the causes 
of DNA mutations. Environmental agents 
such as tobacco smoke trigger some can- 
cers, but others are caused by the normal 
biochemical operations of cells — through 
processes that are little-known, says Strat- 
ton. Working out what these are could 
explain fluctuations in the mutation rate. 
Reproductive biologists are also inter- 
ested in the human mutation rate — in part 
because they have found that some diseases 
are more common in the children of older 
men than of younger ones. Sperm are pro- 
duced throughout a man’s life, whereas 
women are born with a full array of eggs. 
The constant division of sperm precur- 
sor cells means that men tend to pass on 
more new mutations to their offspring than 
women — four times as many, according to 
a 2012 estimate* — and older fathers trans- 
mit more mutations than young ones. This 
means that changes in the biology of sperm 
production or paternal age over evolution- 
ary time could influence mutation rate. 
Even though the human mutation rate 
is still uncertain and unstable, Reich pro- 
posed at the meeting that researchers use 
the slower value for their work, at least until 
better data come along. Just don't think of it 
as a constant, he cautions: “This is not the 
speed of light. This is not physics.” m 


1. Scally, A. & Durbin, R. Nature Rev. Genet. 13, 
745-753 (2012). 

. Fu, Q. etal. Nature 514, 445-449 (2014). 

. Lipson. M. et al. Preprint at http://dx.doi. 
org/10.1101/015560 (2015). 

. Kong, A. et al. Nature 488, 471-475 (2012). 
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The Grytviken whaling station on South Georgia island in the First World War. It has long been abandoned. 


MARINE ECOLOGY 


World’s whaling 
slaughter tallied 


Commercial hunting wiped out almost three million 


animals last century. 


BY DANIEL CRESSEY 


he first global estimate of the number of 

| whales killed by industrial harvesting 

last century reveals that nearly 3 mil- 

lion cetaceans were wiped out in what may have 

been the largest cull of any animal — in terms 
of total biomass — in human history. 

The devastation wrought on whales by 
twentieth-century hunting is well documented. 
By some estimates, sperm whales have been 
driven down to one-third of their pre-whaling 
population, and blue whales have been depleted 
by up to 90%. Although some populations, such 
as minke whales, have largely recovered, others 
— including the North Atlantic right whale and 
the Antarctic blue whale — now hover on the 
brink of extinction. 

But researchers had hesitated to put a 
number on the global scale of the slaughter. 
That was largely because they did not trust 
some of the information in the databases of 
the International Whaling Commission, the 
body that compiles countries’ catches and that 
manages whaling and whale conservation, says 
Robert Rocha, director of science at the New 
Bedford Whaling Museum in Massachusetts. 
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Rocha, together with fellow researchers 
Phillip Clapham and Yulia Ivashchenko of the 
National Marine Fisheries Service in Seattle, 
Washington, has now done the maths, in a 
paper published last week in Marine Fisher- 
ies Review (R. C. Rocha Jr, P. J. Clapham and 
Y. V. Ivashchenko Mar. Fish. Rev. 76, 37-48; 
2014). “When we started adding it all up, it was 
astonishing,’ Rocha says. 

The researchers estimate that, between 1900 
and 1999, 2.9 million whales were killed by the 
whaling industry: 276,442 in the North Atlan- 
tic, 563,696 in the North Pacific and 2,053,956 
in the Southern Hemisphere. Other famous 
examples of animal hunting may have killed 
greater numbers of creatures — such as hunt- 
ing in North America that devastated bison and 
wiped out passenger pigeons. But in terms of 
sheer biomass, twentieth-century whaling beat 
them all, Rocha estimates. 

“The total number of whales we killed is a 
really important number. It does make a differ- 
ence to what we do now: it tells us the number 
of whales the oceans might be able to support,” 
says Stephen Palumbi, a marine ecologist at 
Stanford University in California. He thinks that 
2.9 million whale deaths is a “believable” figure. 
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SOURCE: MAR. FISH. REV. 76, 37-48 (2014) 


Sail-powered whaling ships took around 
300,000 sperm whales between the early 
1700s and the end of the 1800s. But with the 
aid of diesel engines and exploding harpoons, 
twentieth-century whalers matched the previ- 
ous two centuries of sperm-whale destruction 
in just over 60 years. The same number again 
were harvested in the following decade. As one 
whale species became depleted, whalers would 
switch to another (see “The largest hunt’). Most 
commercial hunting was put on hold only in 
the 1980s. 

“It's an eye-opener for people to understand 
just how many whales were killed in the twen- 
tieth century alone. It shows how methodi- 
cal and efficient whalers were,” says Howard 
Rosenbaum, a cetacean researcher who runs 
the Ocean Giants Program at the Wildlife 
Conservation Society, a non-governmental 
organization headquartered in New York City. 

The latest estimate depended on detective 
work by Ivashchenko, who documented a 
huge illegal whaling operation in the North- 
ern Hemisphere by the former Soviet Union for 
her 2013 doctoral thesis. Through interviews 
with former Soviet whalers and researchers, 
and reports from the whaling industry that she 
uncovered, she found that more than halfa mil- 
lion whales had been caught by Soviet vessels, 
and that 178,811 of those were never declared 
to the International Whaling Commission. 


THE LARGEST HUNT 


IN FOCUS | NEWS 


Industrial whaling vessels killed nearly 2.9 million animals of various species in the twentieth century. Most 
were fin and sperm whales, but blue, sei, humpback and minke whales were also taken in their thousands. 
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Some researchers have used genetic data 
on certain populations to estimate how many 
whales existed before human hunting began. 
But the genetics has often suggested much larger 
original populations than the whaling records 
imply, says Rosenbaum. The estimates are now 
creeping closer together, he adds, as the genetics 
work improves and the catch data are revised 
upwards with inclusion of the true Soviet fig- 
ures and other revisions. Understanding how 


Southern Hemisphere 


many whales were taken from the oceans might 
mean that targets that define when a species has 
recovered need to be changed, he says. 

Rocha adds that 2.9 million whales is a lower 
bound. Although motorized boats were more 
efficient than the original sailing vessels in cap- 
turing whales, some of the animals they mor- 
tally wounded would escape or not make it onto 
official records. “The actual number of whales 
killed is going to be more,” he says. m 
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he world’s most powerful particle collider is poised to 
roar once again into action after a two-year hiatus. At 
the end of March, the Large Hadron Collider (LHC) 
at CERN, Europe's particle-physics lab near Geneva, Switzer- 
land, will start smashing particles together at a faster rate and 
with higher energies than ever before. “We're standing on the 
threshold of a completely new view of the Universe,’ says Tara 
Shears, a particle physicist at the University of Liverpool, UK. 
The first run began in earnest in November 2009 and ended 
in February 2013. The LHC collided particles — mainly protons 
but also heavier particles such as lead ions — at high enough 
energies to discover the Higgs boson in 2012, which garnered 
those who predicted the subatomic particle a Nobel prize. 


Desperately seeking SUSY 


Top quark 


Standard-model 


In the next run, set to last three years, energies will rise to an patreles 
our = . eventual 14 teraelectronvolts (TeV; see ‘Hardware rebooted’). 
A new view of the Universe One hope is that higher energies will produce evidence for Hypothetical 
supersymmetry, an elegant theory that could extend the stand- SUSY particles 


ALICE ard model of particle physics (see ‘Desperately seeking SUSY’). 
They could also shake out particles of dark matter, the invisible 
substance that is thought to make up 85% of the matter in the 
Universe (see ‘Decays decoded’). 

More collisions will enable more-precise study of the Higgs’ 
nature (see “The Higgs factory’) and will provide clarity on 
anomalies hinted at in run 1 (see ‘Known unknowns). 

“Tn the first run we had a very strong theoretical steer to look 
for the Higgs boson,’ says Shears. “This time we don't have any 
signposts that are quite so clear.” 


LHCb 


Some theories 
suggest that the stop 
would be the lightest 
SUSY ‘squark’, 
making it the easiest 
to detect because it 
would show up in 
lower-energy collisions 
than the others. 


The LHC is a 27-kilometre ring that 
circulates beams of protons accelerated 
to near the speed of light in opposite 
directions. At four points, the two beams 
collide, creating showers of particles 


BY ELIZABETH GIBNEY / ILLUSTRATION BY NIK SPENCER 


Higher energies mean that the LHC can produce heavier particles (because of E=mc*) — 
and perhaps some of those predicted by the theory of supersymmetry, or SUSY. An extension 
to the standard model of particle physics, SUSY postulates a giant 'superpartner' for each 
known particle, and would offer explanations for mysteries such as the nature of dark matter. 
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Higher energy 


Decays decoded 


If the LHC makes supersymmetric particles, their lifetimes 
will be fleeting. But physicists can deduce their presence 
from the more-stable decay products. In at least one case, 
such SUSY clues could also be evidence for dark matter. 
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that are analysed by four detectors: 
CMS, ATLAS, ALICE and LHCb. 


Hardware rebooted 


The gluino is superpartner of the gluon, which 
carries the strong force that binds the quarks in vw 
protons. So both squarks and gluinos should 

show up more often in proton-proton collisions 
than should other supersymmetric particles. 
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Superconducting magnets will 
operate at higher currents to 
provide the force needed to 
steer the more energetic 
beams in a circle. 


10,000 new electrical 
connectors fitted 
between magnets will 
divert current if there 
is a fault. 


Renovated cryogenics keep 
magnets cold enough to maintain 
a superconducting state, in which 
they have no resistance and so 
generate high current. 


Upgrades to the LHC will allow it to fire 
proton beams at higher rates and 
energies than it did in its first run. 


More collisions 


have almost no 
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interaction with normal 
matter — meaning that ee 
it would slip through 
the LHC’s detectors — 
making it a candidate 
constituent of dark 
matter. 
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Missing 
energy 


Beams are composed 
of bunches of billions 


Higher energy 


The inside of the beam pipe 
has been coated with a 
protective material to make 
the vacuum more secure. 


Collision energy will increase 
from the 8 TeV of run 1 to 
13 TeV and probably up to 
14 TeV by the end of run 2. 
The machine was initially 


of protons, which 
travel at close to the 
speed of light. 


The Higgs factory 


LHC experiments 
discovered the Higgs 
boson but they did not 


Physicists will look for these quarks, and see whether 


their total energy and momentum adds up to that of 
the two gluons that sparked the collision. Just the right 
amount of ‘missing energy’ would suggest the presence 
of neutralinos — and, by a process of deduction, the 
other supersymmetric particles in the decay chain. 


B quarks 


RUN 1: Vacuum supposed to run at this produce enough of the 


particles to examine 
their properties in 
much depth. 


energy before it was damaged 
by a short circuit in 2008. 


Photons 


The Higgs is detected 
through the particles 
it decays into, each of 
which is expected to 
be produced with a 
particular frequency. 


Known unknowns 


More collisions will help to resolve some ongoing mysteries. 
One of these concerns an anomaly in the way a transient 
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the diameter of the and have higher energies More collisions 
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at behaviour that does 
not fit with the standard 
model. There may even 
prove to be more than 
one type of Higgs. 


Upgrades will increase CERN’s 
annual electricity bill by 20% 
to €60 million (US$65 million). 


increase from 600 million to 
more than 1 billion per second, 
thanks in part to a collision 
area that has reduced from 

75 to 48 micrometres across. 
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particle called a B* meson decays. 


The B* meson can decay 
4 in two ways that should > 
be equally rare. 


In run 1, the LHCb detector 
saw the electron decay 
pathway occurring 25% more 
often, which could suggest 
eo the influence of particles 
beyond the standard model. 


Made up of different 
combinations of oe 


penis But further examples in run 2 
are needed to confirm that 
this is not a statistical fluke. 
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human 


Momentum is building to 
establish a new geological epoch 
that recognizes humanity’s 
impact on the planet. But there is 
fierce debate behind the scenes. 


BY RICHARD MONASTERSKY 


Imost all the dinosaurs have vanished from the National 
Museum of Natural History in Washington DC. The fossil 
hall is now mostly empty and painted in deep shadows as 
palaeobiologist Scott Wing wanders through the cavern- 
ous room. 

Wing is part ofa team carrying outa radical, US$45-million redesign 
of the exhibition space, which is part of the Smithsonian Institution. And 
when it opens again in 2019, the hall will do more than revisit Earth’s 
distant past. Alongside the typical displays of Tyrannosaurus rex and 
Triceratops, there will be a new section that forces visitors to consider 
the species that is currently dominating the planet. 

“We want to help people imagine their role in the world, which is 
maybe more important than many of them realize,’ says Wing. 

This provocative exhibit will focus on the Anthropocene — the slice 
of Earth’s history during which people have become a major geological 
force. Through mining activities alone, humans move more sediment 
than all the world’s rivers combined. Homo sapiens has also warmed the 
planet, raised sea levels, eroded the ozone layer and acidified the oceans. 

Given the magnitude of these changes, many researchers propose 
that the Anthropocene represents a new division of geological time. 
The concept has gained traction, especially in the past few years — and 
not just among geoscientists. The word has been invoked by archaeolo- 
gists, historians and even gender-studies researchers; several museums 


© 2015 Macmillan Publishers Limited. All rights reserved 


ILLUSTRATION BY JESSICA FORTNER 


around the world have exhibited art inspired by the Anthropocene; and 
the media have heartily adopted the idea. “Welcome to the Anthropo- 
cene,’ The Economist announced in 2011. 

The greeting was a tad premature. Although the term is trending, 
the Anthropocene is still an amorphous notion — an unofficial name 
that has yet to be accepted as part of the geological timescale. That 
may change soon. A committee of researchers is currently hashing out 
whether to codify the Anthropocene as a formal geological unit, and 
when to define its starting point. 

But critics worry that important arguments 
against the proposal have been drowned out by 
popular enthusiasm, driven in part by environ- 
mentally minded researchers who want to high- 
light how destructive humans have become. 
Some supporters of the Anthropocene idea 
have even been likened to zealots. “There's a 
similarity to certain religious groups who are 
extremely keen on their religion — to the extent 
that they think everybody who doesn't practise 
their religion is some kind of barbarian, says one 
geologist who asked not to be named. 

The debate has shone a spotlight on the typi- 
cally unnoticed process by which geologists 
carve up Earth’s 4.5 billion years of history. Nor- 
mally, decisions about the geological timescale 
are made solely on the basis of stratigraphy — the 
evidence contained in layers of rock, ocean sedi- 
ments, ice cores and other geological deposits. 
But the issue of the Anthropocene “is an order of magnitude more com- 
plicated than the stratigraphy’, says Jan Zalasiewicz, a geologist at the 
University of Leicester, UK, and the chair of the Anthropocene Working 
Group that is evaluating the issue for the International Commission on 
Stratigraphy (ICS). 


WRITTEN IN STONE 

For geoscientists, the timescale of Earth’s history rivals the periodic table 
in terms of scientific importance. It has taken centuries of painstaking 
stratigraphic work — matching up major rock units around the world 
and placing them in order of formation — to provide an organizing 
scaffold that supports all studies of the planet’s past. “The geologic time- 
scale, in my view, is one of the great achievements of humanity,’ says 
Michael Walker, a Quaternary scientist at the University of Wales Trinity 
St David in Lampeter, UK. 

Walker’s work sits at the top of the timescale. He led a group that 
helped to define the most recent unit of geological time, the Holocene 
epoch, which began about 11,700 years ago. 

The decision to formalize the Holocene in 2008 was one of the most 
recent major actions by the ICS, which oversees the timescale. The com- 
mission has segmented Earth’s history into a series of nested blocks, much 
like the years, months and days of a calendar. In geological time, the 66 
million years since the death of the dinosaurs is known as the Cenozoic 
era. Within that, the Quaternary period occupies the past 2.58 million 
years — during which Earth has cycled in and out ofa few dozen ice ages. 
The vast bulk of the Quaternary consists of the Pleistocene epoch, with the 
Holocene occupying the thin sliver of time since the end of the last ice age. 

When Walker and his group defined the beginning of the Holocene, 
they had to picka spot on the planet that had a signal to mark that bound- 
ary. Most geological units are identified by a specific change recorded in 
rocks — often the first appearance of a ubiquitous fossil. But the Holo- 
cene is so young, geologically speaking, that it permits an unusual level 
of precision. Walker and his colleagues selected a 


climatic change — the end of the last ice age’sfinal NATURE.COM 
cold snap — and identified a chemical signature Tohear more about 
of that warming at a depth of 1,492.45 metresina the Anthropocene, 
core ofice drilled near the centre of Greenland’. A visit: 


similar fingerprint of warming canbeseeninlake — go.nature.com/vybhfu 


“The geologic 
timescale, 
in my view, 
is one of 
the great 
achievements 
of humanity.” 
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and marine sediments around the world, allowing geologists to precisely 
identify the start of the Holocene elsewhere. 

Even as the ICS was finalizing its decision on the start of the Holo- 
cene, discussion was already building about whether it was time to end 
that epoch and replace it with the Anthropocene. This idea has a long 
history. In the mid-nineteenth century, several geologists sought to 
recognize the growing power of humankind by referring to the present 
as the ‘anthropozoic era, and others have since made similar propos- 
als, sometimes with different names. The idea 
has gained traction only in the past few years, 
however, in part because of rapid changes in the 
environment, as well as the influence of Paul 
Crutzen, a chemist at the Max Plank Institute 
for Chemistry in Mainz, Germany. 

Crutzen has first-hand experience of how 
human actions are altering the planet. In the 1970s 
and 1980s, he made major discoveries about the 
ozone layer and how pollution from humans could 
damage it — work that eventually earned him a 
share of a Nobel prize. In 2000, he and Eugene 
Stoermer of the University of Michigan in Ann 
Arbor argued that the global population has gained 
so much influence over planetary processes that 
the current geological epoch should be called 
the Anthropocene’. As an atmospheric chemist, 
Crutzen was not part of the community that adju- 
dicates changes to the geological timescale. But the 
idea inspired many geologists, particularly Zalasie- 
wicz and other members of the Geological Society of London. In 2008, 
they wrote a position paper urging their community to consider the idea’. 

Those authors had the power to make things happen. Zalasiewicz 
happened to be a member of the Quaternary subcommission of the ICS, 
the body that would be responsible for officially considering the sugges- 
tion. One of his co-authors, geologist Phil Gibbard of the University of 
Cambridge, UK, chaired the subcommission at the time. 

Although sceptical of the idea, Gibbard says, “I could see it was 
important, something we should not be turning our backs on.” The next 
year, he tasked Zalasiewicz with forming the Anthropocene Working 
Group to look into the matter. 


ANEW BEGINNING 

Since then, the working group has been busy. It has published two large 
reports (“They would each hurt you if they dropped on your toe,” says 
Zalasiewicz) and dozens of other papers. 

The group has several issues to tackle: whether it makes sense to 
establish the Anthropocene as a formal part of the geological timescale; 
when to start it; and what status it should have in the hierarchy of the 
geological time — if it is adopted. 

When Crutzen proposed the term Anthropocene, he gave it the suffix 
appropriate for an epoch and argued for a starting date in the late eight- 
eenth century, at the beginning of the Industrial Revolution. Between 
then and the start of the new millennium, he noted, humans had chewed 
ahole in the ozone layer over Antarctica, doubled the amount of meth- 
ane in the atmosphere and driven up carbon dioxide concentrations by 
30%, to a level not seen in 400,000 years. 

When the Anthropocene Working Group started investigating, it 
compiled a much longer long list of the changes wrought by humans. 
Agriculture, construction and the damming of rivers is stripping away 
sediment at least ten times as fast as the natural forces of erosion. Along 
some coastlines, the flood of nutrients from fertilizers has created 
oxygen-poor ‘dead zones, and the extra CO, from fossil-fuel burning 
has acidified the surface waters of the ocean by 0.1 pH units. The fin- 
gerprint of humans is clear in global temperatures, the rate of species 
extinctions and the loss of Arctic ice. 

The group, which includes Crutzen, initially leaned towards his 
idea of choosing the Industrial Revolution as the beginning of the 
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EARLY- 
ANTHROPOCENE 
PROPOSAL 
Humans began 
transforming the land 
surface thousands of 
years ago, through 
agriculture and other 
activities. That has led 
some researchers to 
propose an early start 
date for the 
Anthropocene. 
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One potential 
stratigraphic marker 
is a rise in the 
atmospheric 
concentration of 
methane millennia 
ago, which is 
recorded in glacial ice. 
This could reflect 
increases in farming 
and animal herding. 
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Anthropocene. But other options were on the table. 

Some researchers have argued for a starting time that coincides 
with an expansion of agriculture and livestock cultivation more than 
5,000 years ago’, or a surge in mining more than 3,000 years ago (see 
‘Humansat the helm’). But neither the Industrial Revolution nor those 
earlier changes have left unambiguous geological signals of human 
activity that are synchronous around the globe. 

This week in Nature, two researchers propose that a potential marker 
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for the start of the Anthropocene could be a noticeable drop in atmos- 
pheric CO, concentrations between 1570 and 1620, which is recorded 
in ice cores (see page 171). They link this change to the deaths of some 
50 million indigenous people in the Americas, triggered by the arrival of 
Europeans. In the aftermath, forests took over 65 million hectares of aban- 
doned agricultural fields — a surge of regrowth that reduced global CO,. 

In the working group, Zalasiewicz and others have been talking 
increasingly about another option — using the geological marks left 
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by the atomic age. Between 1945 and 1963, when the Limited Nuclear 
Test Ban Treaty took effect, nations conducted some 500 above-ground 
nuclear blasts. Debris from those explosions circled the globe and created 
an identifiable layer of radioactive elements in sediments. At the same 
time, humans were making geological impressions in a number of other 
ways — all part of what has been called the Great Acceleration of the 
modern world. Plastics started flooding the environment, along with 
aluminium, artificial fertilizers, concrete and leaded petrol, all of which 
have left signals in the sedimentary record. 

In January, the majority of the 37-person working group offered its 
first tentative conclusion. Zalasiewicz and 25 other members reported” 
that the geological markers available from the mid-twentieth century 
make this time “stratigraphically optimal” for pick- 
ing the start of the Anthropocene, whether or not it 
is formally defined. Zalasiewicz calls it “a candidate 
for the least-worst boundary”. 

The group even proposed a precise date: 16 July 
1945, the day of the first atomic-bomb blast. Geolo- 
gists thousands of years in the future would be able 
to identify the boundary by looking in the sedi- 
ments for the signature of long-lived plutonium 
from mid-century bomb blasts or many of the other 
global markers from that time. 


A MANY-LAYERED DEBATE 

The push to formalize the Anthropocene upsets 
some stratigraphers. In 2012, a commentary pub- 
lished by the Geological Society of America® asked: 
“Ts the Anthropocene an issue of stratigraphy or pop 
culture?” Some complain that the working group 
has generated a stream of publicity in support of the 
concept. “I’m frustrated because any time they do 
anything, there are newspaper articles,’ says Stan 
Finney, a stratigraphic palaeontologist at California State University in 
Long Beach and the chair of the ICS, which would eventually vote on any 
proposal put forward by the working group. “What you see here is, it’s 
become a political statement. That’s what so many people want” 

Finney laid out some of his concerns in a paper’ published in 2013. 
One major question is whether there really are significant records of 
the Anthropocene in global stratigraphy. In the deep sea, he notes, the 
layer of sediments representing the past 70 years would be thinner than 
1 millimetre. An even larger issue, he says, is whether it is appropriate 
to name something that exists mainly in the present and the future as 
part of the geological timescale. 

Some researchers argue that it is too soon to make a decision — it will 
take centuries or longer to know what lasting impact humans are having 
on the planet. One member of the working group, Erle Ellis, a geographer 
at the University of Maryland, Baltimore County, says that he raised the 
idea of holding off with fellow members of the group. “We should set a 
time, perhaps 1,000 years from now, in which we would officially inves- 
tigate this,’ he says. “Making a decision before that would be premature.” 

That does not seem likely, given that the working group plans to 
present initial recommendations by 2016. 

Some members with different views from the majority have dropped 
out of the discussion. Walker and others contend that human activities 
have already been recognized in the geological timescale: the only dif- 
ference between the current warm period, the Holocene, and all the 
interglacial times during the Pleistocene is the presence of human socie- 
ties in the modern one. “You've played the human card in defining the 
Holocene. It’s very difficult to play the human card again,” he says. 

Walker resigned from the group a year ago, when it became clear that 
he had little to add. He has nothing but respect for its members, he says, 
but he has heard concern that the Anthropocene movement is picking up 
speed. “There's a sense in some quarters that this is something ofa jugger- 
naut,” he says. “Within the geologic community, particularly within the 
stratigraphic community, there is a sense of disquiet” 
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Zalasiewicz takes pains to make it clear that the working group has 
not yet reached any firm conclusions.“We need to discuss the utility of 
the Anthropocene. If one is to formalize it, who would that help, and to 
whom it might be a nuisance?” he says. “There is lots of work still to do” 

Any proposal that the group did make would still need to pass a series 
of hurdles. First, it would need to receive a supermajority — 60% sup- 
port — in a vote by members of the Quaternary subcommission. Then it 
would need to reach the same margin in a second vote by the leadership 
of the full ICS, which includes chairs from groups that study the major 
time blocks. Finally, the executive committee of the International Union 
of Geological Sciences must approve the request. 

At each step, proposals are often sent back for revision, and they 
sometimes die altogether. It is an inherently con- 
servative process, says Martin Head, a marine 
stratigrapher at Brock University in St Catharines, 
Canada, and the current head of the Quaternary 
subcommission. “You are messing around with a 
timescale that is used by millions of people around 
the world. So if you're making changes, they have to 
be made on the basis of something for which there 
is overwhelming support.” 

Some voting members of the Quaternary sub- 
commission have told Nature that they have not 
been persuaded by the arguments raised so far in 
favour of the Anthropocene. Gibbard, a friend of 
Zalasiewicz’s, says that defining this new epoch will 
not help most Quaternary geologists, especially 
those working in the Holocene, because they tend 
not to study material from the past few decades or 
centuries. But, he adds: “I don't want to be the per- 
son who ruins the party, because a lot of useful stuff 
is coming out as a consequence of people thinking 
about this in a systematic way.’ 

Ifa proposal does not pass, researchers could continue to use the name 
Anthropocene on an informal basis, in much the same way as archaeo- 
logical terms such as the Neolithic era and the Bronze Age are used today. 
Regardless of the outcome, the Anthropocene has already taken on a life 
ofits own. Three Anthropocene journals have started up in the past two 
years, and the number of papers on the topic is rising sharply, with more 
than 200 published in 2014. 

By 2019, when the new fossil hall opens at the Smithsonian’s natural 
history museum, it will probably be clear whether the Anthropocene 
exhibition depicts an official time unit or not. Wing, a member of the 
working group, says that he does not want the stratigraphic debate to 
overshadow the bigger issues. “There is certainly a broader point about 
human effects on Earth systems, which is way more important and also 
more scientifically interesting” 

As he walks through the closed palaeontology hall, he points out how 
much work has yet to be done to refashion the exhibits and modernize 
the museum, which opened more than a century ago. A hundred years 
is a heartbeat to a geologist. But in that span, the human population 
has more than tripled. Wing wants museum visitors to think, however 
briefly, about the planetary power that people now wield, and how that 
fits into the context of Earth’s history. “If you look back from 10 million 
years in the future,” he says, “you'll be able to see what we were doing 
today.” m SEE EDITORIAL P.129 


Richard Monastersky is a features editor for Nature in 
Washington DC. 


. Walker, M. et al. J. Quat. Sci. 24, 3-17 (2009). 

. Crutzen, P. J. & Stoermer, E. F. /GBP Newsletter 41, 17-18 (2000). 

. Zalasiewicz. J. et al. GSA Today 18(2), 4-8 (2008). 

. Ruddiman, W. F. Ann. Rev. Earth. Planet. Sci. 41, 45-68 (2013). 

. Zalasiewicz, J. et al. Quatern. Int. http://dx.doi.org/10.1016/j.quaint.2014.11.045 
(2015). 

. Autin, W. J. & Holbrook, J. M. GSA Today 22(7), 60-61 (2012). 

. Finney, S.C. Geol. Soc. Spec. Publ. 395, 23-28 (2013). 


aARWNHE 


NO 


12 MARCH 2015 | VOL 519 | NATURE | 147 


© 2015 Macmillan Publishers Limited. All rights reserved 


WARS WITHOUT ENL 


The world is full of bloody conflicts that can drag on for decades. Some 
researchers are trying to find resolutions through complexity science. 


n the seven decades that Colombia has been 
riven by civil war, the country has seen 
kidnappings, rapes, terrorist attacks and 
pitched battles that have cost more than 
220,000 lives and displaced millions of peo- 
ple. Negotiations, peace accords and ceasefires 
have come and gone to little lasting effect. 
The latest round of this seemingly unending 
cycle began in August 2012, when the Marx- 
ist rebels of the Revolutionary Armed Forces 
of Colombia (FARC) agreed to meet with the 
central government in yet another round of 
peace talks. But the negotiations collapsed 
in November after the rebels kidnapped a 
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Colombian army general. The talks have since 
resumed, but even if they one day yield a peace 
accord, there is no guarantee it will hold. More 
than one-third of the world’s peace agreements 
and ceasefires since the 1950s have relapsed 
into violence within five years. 

Colombia’s long history of strife is a clas- 
sic example of ‘intractable’ conflict — a 
self-perpetuating cycle of hostility that can 
grind on for decades. Such conflicts are rela- 
tively scarce — only about 5% of the world’s 
myriad wars qualify — but their longevity 
means that they exert a huge toll on societies. 
Their tragic poster child is the 68-year-long 
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The Israeli-Palestinian 
conflict has been 
ongoing for 68 years. 


Israeli—-Palestinian 

conflict. But the list 
also includes India 
and Pakistan's equally 
long battle over Kashmir, and Sri Lanka’s 
26-year civil war. The Democratic Republic 
of the Congo (DRC) has been riven by con- 
flict since 1996, as has South Sudan since its 
inception in 2011. Any number of intractable 
conflicts may now be emerging in the Middle 
East as Libya, Syria and Iraq are ripped apart 
by sectarian violence and with the rise of the 
Islamist group ISIS (see ‘Intractable conflicts’). 
The intensifying civil war in eastern Ukraine 
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may eventually join the list as well. 

By definition, these are the conflicts that are 
resistant to all the mainstream techniques of 
dispute resolution, says Robert Ricigliano, a 
mediation expert at the University of Wiscon- 
sin Milwaukee. Typically they are plagued by 
a history of “fixes that fail”, he says — peace 
agreements that collapse within days or weeks. 
“We mediate agreements, change leaders, arbi- 
trate boundaries,” he says. “But those things 
don't necessarily get at the underlying dynam- 
ics fuelling conflict?” 

He and a growing chorus of other conflict 
researchers have therefore been pushing for a 
fresh approach — one that views intractable 
conflicts as dynamic, complex systems simi- 
lar to cells, ant colonies or cities, and analyses 
them with the mathematical and computa- 
tional tools developed over the past 30 years 
in complexity science. 

Mainstream practitioners tend to be 
dubious, says Dan Smith, head of the Lon- 
don-based peace-building organization 
International Alert. “We know that conflicts 
are complex,” he says. “What would be useful 
would be a clearer idea of what to do about it” 

But Ricigliano and others have begun to 
answer that criticism by using complexity- 
inspired techniques to help resolve conflicts 
in places such as the DRC. They say that the 
approach can be a much-needed corrective 
to business as usual in the conflict-resolution 
world, where governments and international 
organizations too often tackle conflicts piece- 
meal. These bodies tend to “look at the econ- 
omy, or governance, or gender relations or 
education as if each existed in isolation’, says 
Smith. “It’s a convenient way to handle the 
issues, but it means you don't really address 
the complex reality.” 


HARD PROBLEMS 

It was just this kind of blinkered thinking that 
led psychologist Peter Coleman to rebel. It was 
2000, recalls Coleman, head of the Morton 
Deutsch International Center for Cooperation 
and Conflict Resolution at Columbia Univer- 
sity in New York City. He had broken his foot 
and decided to spend his convalescence at 
home delving into the research literature on 
intractable conflict. But what he found left him 
deeply frustrated. “People had their simple, 
sovereign theories about why conflicts become 
intractable,” he says. “It’s because of trauma, or 
social identity or a history of humiliation. We 
understood pieces of the problem, but not how 
they interact” 

Coleman discovered an alternative 
approach just a few years later, when he came 
across the work of social psychologists Robin 
Vallacher and Andrzej Nowak, both now at 
Florida Atlantic University in Boca Raton. 
Their work was not directly related to con- 
flict — they were studying things such as how 
the human sense of self emerges, and how feel- 
ings about others can switch from positive to 


“SUCCESS DOESN'T 
MEAN THAT WE'VE 
ENDED THE CONFLICT. 
IT MEANS WE'VE 
ENGAGED A SYSTEM 
SO THAT VIOLENCE 
DECLINES OVER TIME.’ 


negative. But Coleman was impressed with 
Vallacher and Nowak’s use of a mathematical 
tool known as dynamical systems theory to 
analyse their results. 

Made famous by James Gleick’s 1987 book 
Chaos, this theory provides a framework for 
understanding a remarkably broad range of 
complex systems, from weather patterns to 
neural activity in the brain. One way to visu- 
alize the mathematics is to imagine a land- 
scape of hills and valleys. The behaviour of the 
complex system corresponds to the path ofa 
ball rolling across this landscape. The trajec- 
tory becomes very complicated as the ball is 
deflected by the hills. But eventually, the ball 
will get trapped in one of the valleys, where it 
will either cycle endlessly around the walls or 
sink to the middle and lie still. The ball’s final 
trajectory or resting place is called an attractor. 

To Coleman, this kind of entrapment was 
the perfect metaphor for the stable, if destruc- 
tive, patterns of social behaviour seen in intrac- 
table conflicts. The landscapes in this case are 
mainly psychological and social, comprising 
innumerable strata of history, identity and 
collective memories of harms suffered at the 
hands of the ‘other’. Yet the resulting conflict 
attractors are terribly real, he says, with psy- 
chological forces conspiring to “create simplis- 
tic narratives about conflicts that are devoid of 
nuance and keep us locked in”. 

To make this mathematical view of intracta- 
ble conflicts into something more than a meta- 
phor — and hopefully to turn it into a set of tools 
that could make a difference in the real world — 
Coleman, Vallacher and Nowak in 2004 formed 
the Dynamics of Conflict working group, which 
has since attracted four more members. 

As a result of this collaboration, Nowak 
has started to create computational models 
that capture the dynamics of conflicts. These 
include ‘agent-based’ simulations that contain 
thousands of digital robots — the agents — 
each of which embodies some of the simple 
behaviours that social psychologists believe 
have a role in conflict. One such model, 
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developed with researchers outside the work- 
ing group, features agents that vary in how 
competitive or cooperative they are, and adjust 
those proclivities according to how much hos- 
tility or aggression they experience from the 
other agents. 

In this simulation’, small conflagrations 
flare up and die down much as they do in real 
communities. Occasionally, however, the con- 
flicts expand until they lock the whole virtual 
community into a cycle of recrimination — 
the classic sign of intractability. Working with 
Dean Pruitt of the School for Conflict Analysis 
and Resolution at George Mason University in 
Arlington, Virginia, Nowak has also developed 
mathematical models showing how attractors 
can explain the escalation of conflicts that tips 
them into an intractable state’. Now he and his 
colleagues are working on the next step: com- 
paring the evolution of communities in these 
simple models with data from real-world con- 
flicts such as the Israeli—Palestinian stand-off. 
“This is the first time we've added empirical 
data to a dynamical model, and we're getting 
promising results,’ says Nowak. 


MAKING SENSE OF THE SYSTEM 

Another line of research is to move from gen- 
eralities to specifics, and develop visualization 
tools that can help mediators to untangle the 
complexities of real-world conflicts. The hope 
is that such ‘conflict maps’ will help researchers 
to keep track of the interconnections between 
players and events, and make clear the feed- 
back loops and key networks that can escalate 
or inhibit conflict. 

Conflict maps can take many forms, from 
hand-drawn sketches on a whiteboard to 
computer-generated networks based on real 
data. But whatever their form, they get strong 
endorsement from Ricigliano, who has worked 
on peace-building interventions in areas rang- 
ing from Colombia and South Africa to Iraq 
and Cambodia. 

In 2000, for example, Ricigliano went to the 
DRC to try to find some resolution to the Sec- 
ond Congo War: a blood-drenched conflict 
between various rebel groups and Mai-Mai 
militias fighting for the government. Behind 
the scenes, he and his colleagues watched the 
unravelling of one hard-won peace agree- 
ment after another. “At best we were having 
a neutral impact, he says, “and maybe even 
a negative one.” 

But then in 2002, he and his colleagues 
began to map all the connections between 
warring parties and competing interests in 
the conflict. The maps made it clear that local 
groups were being manipulated by national 
rebel organizations, who wanted the conflict 
to continue because it allowed them to access 
valuable minerals. “So we shifted tactics, and 
began trying to break the links between the 
national-level actors who were manipulating 
local actors, and to facilitate local-level cease- 
fires of significance, says Ricigliano. 
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By 2003, these dialogues had helped the 
United Nations to negotiate a transition gov- 
ernment that included the major rebel groups, 
and violence declined. “It wasnt perfect,’ says 
Steve Smith, an independent conflict-reso- 
lution consultant who was in the DRC at the 
time. “Not everyone was in agreement, and lit- 
tle conflicts continued, but we had a structure 
in place and a direction to go.” 


STATE OF MIND 

Beyond the models and the maps, advocates of 
systems thinking are hoping to spread a shift 
in perspective on intractable conflicts. One 
convert is Andrea Bartoli, dean of the School 
of Diplomacy and International Relations at 
Seton Hall University in South Orange, New 
Jersey, and a mediator who has worked in 
countries such as Mozambique and Kosovo. 
When he first learned about the dynamical sys- 
tems perspective in discussions with Coleman 
alittle over a decade ago, he says, “it provided 
a new language for talking about conflict, 
and opened up new ways to think about old 
problems”. He has since joined the Dynamics 
of Conflict working group, and in 2009 joined 
with Coleman and Beth Yoshida-Fisher, direc- 
tor of the Negotiation and Conflict Resolution 
program at Columbia University, to set up the 
Advanced Consortium on Cooperation, Con- 
flict, and Complexity (AC4) there. 

That new language can bea revelation even 
to professionals, says Naira Musallam, a con- 
flict researcher at New York University’s Center 
for Global Affairs and a member of both the 
Dynamics of Conflict group and AC4. She 
tells the story of a course she teaches at the US 
Military Academy West Point in New York, in 
which she starts by running through a list of 
common mental shortcomings in how peo- 
ple think about conflict, poverty and other 
social problems. “We compare fluid situa- 
tions to fixed things,” she says, “we think in 
straight lines rather than loops, we focus on 
understanding problems and assume that 
this will lead to solutions, and often miss the 
unintended consequences of well-intentioned 
interventions.” 


1965 1975 1985 1995 

After one class, says Musallam, an officer 
who had served in Iraq and Afghanistan wrote 
to her. “I know many good people who have 
died because of errors [highlighted] on this 
list? he wrote. “I also see several errors that I 
have made before ... It’s frustrating that this is 
the first time that I’ve seen this list in a way that 
challenges my world view around conflict.” 

These same straight-line assumptions are 
also built into the way in which many institu- 
tions operate, says Musallam — and not just 
those devoted to peace-building. “They want 
nice, tidy plans for interventions, and clear 
deliverables over the short term’, she says. This 
often leads to plans to ‘solve complex prob- 
lems through a series of discrete steps that are 
defined in advance by experts. 

One of the key lessons of the systems mind- 
set is to stop approaching conflicts as prob- 
lems that need to be fixed, says Ricigliano, and 
instead think of them as systems with under- 
lying dynamics that need to shift. “Success 
doesnt mean that we’ve ended the conflict,” 
he says. “It means we've engaged a system so 
that violence declines over time” 

This view is finding increasing support 
from outside allies. The non-profit Berlin- 
based Berghof Foundation, for example, has 
used systems thinking in its efforts to resolve 
political and ethnic violence in countries such 
as Sri Lanka, which has been torn by civil war 
since 1983. 

But there is plenty of room for scepticism. 
Dan Smith, for one, is sympathetic to the 
complex systems view of conflict, but is wary 
of its sweeping generalizations. “Any analysis 
employing these principles is only going to be 
as good as the analyst doing it,” he says. “You 
can have the best methodology, but if you have 
an uninformed or incurious analyst, you won't 
get good results.” 

Even advocates admit that specific recom- 
mendations are a work in progress. That is why 
in 2013, Coleman and Ricigliano joined with 
others to set up an annual five-day workshop 
known as the Dynamic Systems Theory Inno- 
vation Lab, which brings together biologists, 
economists, physicists, political scientists and 
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a sample of those that have emerged since the Second World War. Having 
contributed to around 
oS 3.8 million deaths, this 
: : relatively new series of 
o_O conflicts is also one of 


the bloodiest. 


The Uighur 
people are of Turkic 


descent and are trying 
to break free from 
China's rule. 


The 
apartheid system ended 
when the government 
reached an agreement 
with the African 
National Congress. 


2005 2015 

other scholars and practitioners to talk about 
real-world applications. “We hope that five 
years out, we'll have a better idea of what mat- 
ters most,’ says Coleman. 

There is already a growing body of experi- 
ments they can draw on. In Israel, for exam- 
ple, a series of anti-conflict interventions being 
developed under the leadership of psychologist 
Eran Halperin at the Interdisciplinary Center 
Herzliya in Israel have proved effective in mak- 
ing people more open to seeing things from the 
other side's point of view*”. 

Although the label ‘intractable conflict’ 
implies unending strife, no struggle lasts for- 
ever. As the 1980s drew to a close, South Africa 
had been locked in racial conflict for decades 
and was on the brink of a civil war between 
increasingly militant members of the African 
National Congress (ANC) and the government 
of President Frederik Willem De Klerk. Amid 
international condemnation of the apartheid 
system, and fearing that the country could 
become engulfed in a bloody street war, De 
Klerk began releasing imprisoned ANC mem- 
bers in late 1989. Finally, in February 1990, 
he freed ANC leader Nelson Mandela after 
27 years in prison. That conciliatory move was 
the tipping point for the emergence of multira- 
cial democracy within three years. 

South Africa's long transition was a difficult 
journey, with many losses and setbacks along 
the way — par for the course for any intractable 
conflict. Yet as Mandela once famously said: 
“Tt always seems impossible until it’s done” m 


Dan Jones is a freelance writer in Brighton, 
UK. 
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A Tuareg woman carries water through a sandstorm in drought-ridden Mali. 


Put people at the centre of 
global risk management 


An individual focus is needed to assess interconnected threats 
and build resilience worldwide, urge Jan Willem Erisman and colleagues. 


lobalization is changing the nature 
(S" risk. Natural and social systems 

— from climate to energy, food, 
water and economies — are tightly coupled. 
Abrupt changes in one have a domino effect 
on others. Floods in Thailand in 2010, for 
example, led to a global shortage of com- 
puter hard disks as a result of factories 
closing, as well as more than US$330 million 
in damage and around 250 deaths. 


The exposure of people and assets to risks 
is increasing worldwide. From 1980 to 2012, 
annual economic losses from environmen- 
tal disasters rose more than sevenfold, from 
about $20 billion to $150 billion a year’. 

Yet most risk assessments ignore net- 
worked threats”. The annual Global Risks 
report of the World Economic Forum 
considers risks qualitatively, based on 
the views of experts’. But global outlooks 
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remain sectorial and too coarse to guide 
individuals, organizations, municipalities 
or nations. 

Risk reports also neglect the collective 
impacts of personal choices’. For exam- 
ple, eating more beef causes deforestation 
and biodiversity loss in the Amazon. Local 
dams for hydropower or water storage alter 
sediment flows to fertile coastal regions. 
The movement of people from the > 
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countryside to cities affects water, food, 
climatic and energy systems planet-wide. 

Understanding networked risks is 
essential for achieving the United Nations 
Sustainable Development Goals, which are 
being defined this year*. The 17 proposed 
goals are interdependent. For example, the 
stimulation of renewable energies and bio- 
fuels to address climate change also affects 
food production and water resources. 


BROAD FOCUS 

We propose a systems-based approach for 
quantifying risk that integrates individual 
responses and considers the transfer of 
information and feedback mechanisms 
across networks (see ‘Safety secured’). Such 
an approach identifies pinch points — geo- 
graphic, economic and social — so that key 
systems and individual behaviours can be 
made more sustainable and resilient. 


SAFETY SECURED 


Current global-change risk assessments 
take a top-down approach and target single 
stressors, such as the climate. They focus on 
the most vulnerable and at-risk communi- 
ties, infrastructure, sectors, ecosystems and 
areas. Links between extreme weather and 
climate change have begun to be addressed, 
but wider impacts on land degradation, food 
and energy production, water supply and 
environmental hazards have not. 

Disaster-reduction frameworks, such as 
the UN Post-2015 Development Agenda, 
which will be agreed this month in Japan, 
aim to improve reactions to adverse 
events once they have happened. But the 
UN agenda does not promote resilience 
in general or help stakeholders such as 
farmers or municipal leaders to manage 
multiple risks. 

Programmes for delivering knowledge 
about risk to sectors of society are too narrow. 


Promoting overall resilience (left) rather than managing many individual risks (right) is the best way to 


minimize impacts from adverse events. 
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RESILIENCE 
Concerns whole system 
Aims for long-term security 
Requires indirect management 
Self-regulating 
Makes use of variability 
Seeks dynamic equilibrium 
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RISK MANAGEMENT 
Focuses on single risks 
Aims for short-term security 
Requires direct intervention 
Needs continuous monitoring 
Eliminates variability 
Seeks static equilibrium 
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Climate services inform the agriculture and 
insurance sectors about climate change. But 
their academic focus does not serve corpo- 
rate clients, who want climate data packaged 
into products that they can use to manage, for 
example, exposure to market disruptions or 
rising energy prices. 

The Climate Corporation in San 
Francisco, California, sells weather and 
agronomic data-monitoring and modelling 
tools to farmers. But its products do not, for 
instance, consider other impacts such as the 
risks of air and water pollution associated 
with the use of nitrogen fertilizers’. 

Approaches to communicating a broader 
set of global risks appeal to researchers and 
policy-makers. For example, the ‘planetary 
boundaries’ concept’ identifies tipping 
points in nine key Earth systems (including 
climate change, biodiversity and the nitro- 
gen cycle) above which Earth’s habitability 
would be threatened. But global limits are 
difficult to translate into targets or strate- 
gies that are meaningful for a particular 
company, city or region. 

How then should scientists, insurance 
companies, policy-makers and other stake- 
holders combine risk assessments across 
scales, stressors and sectors? 


USER FIRST 

We argue that Earth-system risk management 
should follow the example of health-care sys- 
tems, in which emphasis is switching from 
medicalization to supporting peoples ability 
to adapt and self-manage’. Collectively, indi- 
vidual choices feed back into the community 
and help it to lower its health risks. 
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SOURCE: J. TEN NAPEL, F. BIANCHI & M. W. P BESTMAN IN INVENTION 
FOR A SUSTAINABLE DEVELOPMENT OF AGRICULTURE 32-53 (2006) 
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Risk management must therefore start 
with the users — be they people, organi- 
zations, municipalities or nations. Risks 
should be identified and prioritized in 
expanding circles around the user (see 
‘Networked threats’), from local and short- 
term risks to more distant and long-term 
related global threats. 

Take food. Supplies are threatened by 
elevated production costs, ecosystem, water 
and soil-quality impairment, food wastage 
and nutrient losses, poor food distribution 
and alienation of consumers from produc- 
ers. Yet farmers consider only immediate 
factors — maximizing yields, avoiding dis- 
ease and short-term price fluctuations — 
when deciding how and when to plant crops. 

In our approach, farmers would also con- 
sider climate change, energy prices, floods 
and droughts and ecosystem services. The 
wider ecological and social repercussions of 
personal decisions such as whether to use 
more fertilizer or pesticides, expand soil till- 
age or irrigation would become more appar- 
ent. Worldwide, 10% of farmers manage 70% 
of the agricultural land, so the side effects 
of such localized choices can be widespread. 


RADICAL REFRAMING 

In practical terms, a networked risk-assess- 
ment model should combine standard tech- 
niques for individual risk assessments (such 
as those set out for enterprises by the Inter- 
national Organization for Standardization) 
with a mechanism to capture the complexi- 
ties of human behaviour. One such method 
is agent-based modelling’, which uses sim- 
ulations of a collection of computational 


NETWORKED THREATS 


As well as immediate risks such as droughts and floods, individuals should factor in remote threats such 
as climate change into their decisions. If risks from the local to the global and connections between them 
are assessed, people can choose effective actions that build resilience. 


Extreme weather, 
pests and diseases 


entities that interact according to a set of 
mathematical rules. This approach has been 
used to model stock-market trends, traffic 
flows and the spread of epidemics. 

Two major shifts in thinking are needed 
to deliver the global risk-network model. 
First, the risk narrative needs to be reframed 
to put the individual at the centre. Second, 
risk modelling should adapt to take a broad 
focus — encompassing environmental and 
socio-economic risks across the whole 
Earth system. 

The UN’s sustainability and disaster- 
reduction programmes should adopt this 
user-centric focus and redirect their exist- 
ing efforts. The UN-led Global Framework 
for Climate Services should be similarly 
extended to include inventories of issues 
that matter to the individual (collated 
through platforms such as the UN website 
vote.myworld2015.org). 

Relevant risks at particular scales will 
need to be defined and methods for analys- 
ing them jointly developed. Future Earth, a 
global research hub launching this year to 
provide the knowledge and support to accel- 
erate transformations to a sustainable world, 
should coordinate the research. 

Partnerships must be built across 
disciplines to supply and share data and 
analysis tools. Practitioners from the pri- 
vate and public sectors will need to work 
with economists, engineers, social scien- 
tists, information specialists and climate and 
Earth-system experts. 

Investment by public-private partner- 
ships will be essential to amass the neces- 
sary resources, maximize uptake of this 
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multiscale approach, stimulate innovation 
from industry and guarantee that the user’s 
needs are at the core. As the cost of disasters 
increases each year, the impetus for both 
governments and industry to invest in risk 
management and resilience is clear. m 
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Learning English is essential for modern scientists — but German and French were once more significant. 


LINGUISTICS 


The ascent of English 


Andrew Robinson salutes a chronicle of how one language came to dominate science. 


scientific paper published in 1905 
A gloried in the title Zur Elektrodyna- 

mik bewegter K6rper. Today, Albert 
Einstein’s ‘On the electrodynamics of moving 
bodies; which introduced the special theory of 
relativity, would be published in English. Eng- 
lish has become the language of almost every 
leading journal across the natural sciences, 
whatever its country of origin. Large confer- 
ences held in non-anglophone countries, such 
as those of the European Geosciences Union, 
often use English. Of the major producers of 
scientific research, only China and, to a lesser 
extent, Japan host international conferences 
in their own languages. 

In 1905, however, some 30% of global 
scientific literature was in German, with a 
similar proportion in English, marginally 
less in French and much less in Russian and 
Japanese. So reveals US historian Michael 
Gordin in Scientific Babel, a massive, erudite 
and engaging study of the role of languages in 
science based on 15 years of research — and 
drawing on Gordin’s knowledge of French, 
German, Russian, Esperanto and Latin. The 
numerous translations are generally his own. 

The dominance of English — unpredicted 
acentury ago — is rooted in Germany’s defeat 
in the First World War. For some years after- 
wards, there was an international boycott of 
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German scientists and 
attempts were made to 
curb the use of Ger- 
man by the League 
of Nations and 22 US 
states. The advent of 
the Third Reich in 


1933 boosted English 
as the scientific lingua 
franca, as did the scientific Babel: 
United States’ postwar The Language of 


Science from the 
Fall of Latin to the 
Rise of English 
MICHAEL GORDIN 
Profile/Univ. Chicago 
Press: 2015. 


ascendancy in scien- 
tific output and geopo- 
litical power — along 
with a perception of 
English as neutral. 
Gordin asks, with a 
touch of irony, whether this English-language 
“fait accompli” is always good for science. 
Although he finds that most scientists are in 
principle inclined to embrace the idea of one 
language for communicating, the dominance 
of English can disadvantage non-English 
speakers. The most creative thinking tends 
to be done in the language in which a person 
feels most at home. As Fields Medal winner 
Laurent Lafforgue noted (in French) in 2005: 
“it is to the degree that the French mathemati- 
cal school remains attached to French that it 
conserves its originality and its force”. 
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Gordin asks: does history suggest a future 
alternative? He considers relevant historical 
episodes in detail. Latin, for example, became 
the language of European science during 
the Italian Renaissance, but its use began to 
decline in the seventeenth century. Thus, 
Galileo Galilei turned to Italian, and Isaac 
Newton shifted from Latin for his Principia 
Mathematica (1687) to English for his Opticks 
(1704). During the Enlightenment, Euro- 
pean libraries collected roughly one-third 
of their books in Latin, one-third in French 
and the rest in the local vernacular. Barring 
taxonomic nomenclature, the use of Latin had 
died out among leading scientists by the time 
of Charles Darwin, who wrote in English. 

The linguistic complexity in science in the 
late nineteenth century is demonstrated by 
the story of the periodic table and its con- 
tested origin, which Gordin explored in 
his 2004 book A Well-Ordered Thing (Basic 
Books). When the German-language jour- 
nal Zeitschrift fiir Chemie mistranslated an 
1869 Russian abstract by Dmitri Mendeleev, 
avehement priority dispute blew up between 
Mendeleev and German chemist Lothar 
Meyer. In a crucial sentence, “The elements 
ordered according to the magnitude of their 
atomic weights show a periodic change in 
properties’, a rushed translator used the 
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German word stufenweise (‘phased’) instead 


a ia 
of periodische (‘periodic’); as a result, Meyer . 0 0 wat | | bh rl ef 


claimed precedence for his own research. 
When Mendeleev objected, Meyer replied: 


“Tt seems to me an excessive demand that we Rust: The Longest War 
German chemists read, besides those articles Jonathan Waldman SIMON AND SCHUSTER (2015) 
appearing in the German and Romance lan- Corrosion has killed people in nuclear power plants, taken out planes 
guages, also those in the Slavic languages”. in mid-air and reddened the face of Mars. So notes environmental 
He did not mention English. Us a! journalist Jonathan Waldman in this dexterous technological study of 
By the end of the nineteenth century, this insidious process, which is nibbling away at Western civilization. 
scientists everywhere were obsessed with a The science compels, but what leap from the page are Waldman’s 
multilingual information overload — Gor- | ‘nian way snapshots of rust geeks — such as the team that rebuilt the hole- 
din’s scientific babel. The solution seemed to ridden metal skin of New York's Statue of Liberty in the 1980s, and 
be an auxiliary universal language. Volapiik Bhaskar Neogi, ‘integrity manager’ of the Trans-Alaska Pipeline 
(‘Worldspeak’) was invented in 1880; the System, one of the heftiest metal objects in the Western Hemisphere. 


better-known Esperanto arose in 1887, and 
its offshoot, Ido, arrived in 1907. Gordin 
sympathetically analyses these artificial lan- 
guages — taken seriously by leading scientists 
of the time — through the lens of Ido advo- 
cate Wilhelm Ostwald, a Nobel-prizewinning 
German chemist. In-fighting dissolved the 
movement, and Ostwald abandoned Ido 
during the First World War, championing 

German as an international language. 
During the cold war, and especially after 
the Soviet Union launched Sputnik in 1957, 
much scientific 


Producing Power: The Pre-Chernobyl History of the Soviet Nuclear 
Industry 

Sonja D. Schmid MIT Press (2015) 

In the annals of nuclear meltdown, the April 1986 explosion 

at Chernobyl in Soviet Ukraine remains the most devastating, 
contaminating thousands of square kilometres of land. This 
trenchant study by science historian Sonja Schmid digs deep into 
the catastrophe’s tangled prehistory to make nuanced sense of 

it. She unravels key scientific, social and political factors, from the 
plant’s lack of ‘redundant’ safety features to rivalries in the Soviet 
nuclear industry and inefficiencies in the country’s economy. 


“By the end of attention switched 

the nineteenth to literature in Rus- 

century, sian, which by 1970 The Chimp and the River: How AIDS Emerged from an African Forest 
scientists reached 20% of David Quammen W. W. NorTON (2015) 

everywhere were the global output. This intense study of the origins of AIDS is excerpted and adapted by 
obsessed with In 1961, 85 Soviet David Quammen from his book Spillover (W. W. Norton, 2012; see 
amultilingual journals were being N. Wolfe Nature 490, 33; 2012). With Sherlockian verve, QUammen 
overload.” translated into Eng- traces the trail from the first human cases, through labs around the 


world, and finally to virologist Beatrice Hahn’s discovery that simian 
immunodeficiency virus (SIV), from which HIV-1 is derived, can kill 
made for machine translation from Russian wild chimpanzees. Quammen’s portrait of the real ‘Patient Zero’ as 
into English. Both translation programmes a Cameroonian hunter clumsily butchering a chimp is a masterful 
were eventually abandoned in favour of summing-up of the evidence. 
increased Russian-language teaching for US 
scientists — until the 1991 collapse of the 
Soviet Union sealed the fate of scientific Rus- 
sian beyond its own borders. A lively Russian- 
language journals scene still prevails in Russia. 
Anglophone dominance is unlikely to 
change soon, says Gordin. If scientific impor- 
tance were based on population, Spanish 
would be a major scientific language; if on 
geopolitical power, scientists would publish 
much more in Chinese. In the 1660s and later, 
philosopher and mathematician Gottfried 
Leibniz advocated a universal writing sys- 
tem for science independent of any spoken 
language, similar to mathematical notation. 
This must stay a dream: intellectual activity 
demands language. As the polyglot Gordin 
concludes, “we remain bound to the con- 
straints of history, to the shackles of the words 
in human languages: untranslatable yet intel- 
ligible, frustrating yet infinitely beguiling” = 


lish, with US gov- 
ernment funding. Preposterous claims were 


Science in Wonderland: The Scientific Fairy Tales of Victorian Britain 
Melanie Keene OXFORD UNIVERSITY PRESS (2015) 

The prodigious pace of Victorian research — from the unearthing 

of dinosaur fossils to the laying of a transatlantic telegraph cable — 
posed a stiff pedagogical challenge. To deliver the new findings on 
nature to the public, writers seized on the era’s obsession with the 
supernatural. Science historian Melanie Keene argues here that many 
“fairy tales of science” were educational gems: by harnessing tropes 
of the genre to communicate facts, they evoked a scientific wonder 
that truly came into its own in the age of quantum mechanics and 
relativity. (See M. Keene Nature 504, 374-375; 2013.) 


Climate Shock: The Economic Consequences of a Hotter Planet 
Gernot Wagner and Martin L. Weitzman PRINCETON UNIVERSITY PRESS 
(2015) 

Economists Gernot Wagner and Martin Weitzman deliver a high- 
voltage shock in their analysis of the costs of climate change. With 
uncurbed emissions predicted to rise steeply by 2100, a radical 
reframing of the catastrophe as a global risk- management issue is 
due, they argue. Their blueprint is a three-step response: scream 
(call for business and policy-makers to snap to it); cope (adapt 
rapidly to events); and profit (invest in green industry). Barbara Kiser 


Andrew Robinson is the author of The 
Story of Writing. 
e-mail: andrew.robinson33@virgin.net 
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China is setting records for installing solar panels — even as most of the country’s energy comes from coal. 


SUSTAINABILITY SCIENCE 


Exploiting the synergies 


Dave Griggs relishes Jeffrey Sachs’s analysis of the policy 
and practice key to a viable future for people and planet. 


sa concept and practice, sustainable 
Asem emerged on the global 

scene in 1972, with the United 
Nations Conference on the Human Environ- 
ment. Four decades on, in the year that the 
United Nations is due to set its Sustainable 
Development Goals (SDGs), the idea remains 
fuzzy around the edges. Jeffrey Sachs's The 
Age of Sustainable Development sharpens our 
understanding. It is, in my view, the best, most 
comprehensive and most articulate exposi- 
tion of sustainable development ever written. 

Sachs is a rock-star economist, leading 
thinker in sustainable development and 
senior UN adviser. The Age of Sustain- 
able Development is based on his excellent 
massive open online course (MOOC) of the 
same name. 

He defines sustainable development as a 
“normative outlook” aiming to solve global 
problems such as climate change through 
environmental, economic and social goals, 
along with good governance. He shows that 
it is a science of complex systems: the global 
economy, the Earth 
system, politics and 
social interactions 
such as support net- 
works and social 
media. 


2D NATURE.COM 
For more on science 
in culture, see: 
nature.com/ 
hooksandarts 
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Sustainable develop- 
ment was once con- 
sidered a problem of 
developing countries, 
solvable through, and 
almost as a by-product 
of, economic growth. 
But no country has 
pulled itself out of 
poverty without fos- 
sil fuels, whose emis- 
sions drive climate 
change and pollution, 
or nitrogen-based 
fertilizers, which pro- 
mote algal blooms. And richer countries have 
demonstrated the problems of uncontrolled 
development of land and resources, a factor 
in biodiversity loss. Sustainable development 
is crucial for all countries, so the SDGs will 
apply to every nation. 

Sachs recognizes the benefits of economic 
growth, citing the case of China, which has 
achieved history's most remarkable economic 
transformation, with extreme poverty falling 
from 84% in 1981 to just 12% in 2010. How- 
ever, he also shows the limitations of growth 
through challenges still affecting billions, 
from poverty to food security. He explains 
some of how we got to where we are today, 


The Age of 
Sustainable 
Development 
JEFFREY D. SACHS 
Columbia Univ. Press: 
2015. 
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highlighting the economic and social factors 
that maintain the status quo or make things 
worse, such as the historic, geographical and 
political forces that are widening inequalities 
in countries such as the United States. 

How to achieve a sustainable future? Edu- 
cation, Sachs notes, is a lynchpin. When girls 
stay in school for longer, fertility rates drop. 
Households with fewer children invest more 
in education, health and nutrition. He quotes 
Scottish economist Adam Smith, who wrote 
in The Wealth of Nations (1776) that because 
society benefits when people are educated, the 
costs should be “defrayed by the general con- 
tribution of the whole society”. That we have 
not achieved this more than two centuries 
later is a baffling and damning indictment. 

Alongside the social challenges are climate 
change, ocean acidification and the current 
mass extinction of species — serious threats 
to humanity’s capacity to thrive or even 
survive. For example, the concentration of 
carbon dioxide in the atmosphere is rising by 
more than 2 parts per million each year. Sachs 
concludes that no country is currently on a 
path to sustainable development. 

What becomes clear is that understand- 
ing the links between these issues is essential. 
Along with development aims such as sanita- 
tion and health care for a growing and ageing 
population, there are environmental chal- 
lenges such as mitigating climate change. It 
is important to pinpoint solutions with posi- 
tive trade-offs, such as encouraging people to 
walk or cycle, to reduce emissions while con- 
ferring health benefits. It is equally important 
to avoid fixing one problem by exacerbating 
another, for example providing universal 
access to affordable energy by burning more 
fossil fuels (see page 151). 

Sachs struggles with this, as do businesses 
and governments. He compartmentalizes the 
book into chapters dealing with issues such 
as health, food security and climate change, 
which fails to show the interdependent nature 
of the beast in all its horrifying complexity. 
But in a finale on the SDGs, he delivers a 
unified message clearer, more insightful and 
more accessible than previous attempts. 

Sachs explains the benefits of goal-based 
development such as mobilizing knowledge 
and practice networks — most importantly, 
those that include the scientific community, 
the public, politicians and non-governmental 
organizations. He explains how they might be 
financed through the public and private sec- 
tor, and governed with accountability, trans- 
parency and participation. 

I would make this book compulsory read- 
ing for all politicians and business leaders. = 


David Griggs is a professor of sustainable 
development at Monash University in 
Melbourne, Australia, and the University of 
Warwick, UK. 

e-mail: dave.griggs@monash.edu 
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Stamp out shabby 
research conduct 


As heads of funding bodies for 
medical research, we are concerned 
that questionable practices among 
researchers seem to be becoming 
more prevalent. Although these 
do not meet current definitions of 
misconduct, they can still distort 
biomedical science and cause irre- 
producibility — with potentially 
critical consequences for policies 
and patients. 

For example, researchers 
may cut corners by withholding 
methodological details or 
by failing to disclose data for 
independent scrutiny. Inadequate 
training can also be responsible 
for false conclusions arising from 
flawed experimental design, 
methodology or statistical 
analysis. Some countries, 
including Australia, Canada and 
the Netherlands, have a category 
for these — ‘poor conduct. This 
must be addressed if proved, even 
though it is less egregious than 
research misconduct. 

International funding bodies, 
informally convening with heads 
of international biomedical 
research organizations, have 
agreed to undertake a worldwide 
analysis of definitions of 
different types of misconduct 
and the policies used to tackle 
them. This should help to 
harmonize standards of research 
rigour and integrity globally, for 
the ultimate benefit of patients. 
Warwick P. Anderson* National 
Health and Medical Research 
Council of Australia, Canberra, 
Australia. 
warwick.anderson@nhmrc.gov.au 
*On behalf of 4 correspondents 
(see go.nature.com/adxrpe for 


fulllist). 


Tax transactions to 
stabilize trading 


An obvious method for 
controlling high-speed trading 
(M. Buchanan Nature 518, 
161-163; 2015) is a global 
financial-transaction tax of 
the kind proposed by the 


European Union in 2011. 

Such a tax, originally designed 
to raise revenue, could be set to 
lead to a typical trading time. 
There would be no need to 
regulate trading times explicitly: 
it would simply not be profitable 
to trade on the tiny, rapid 
fluctuations that now trigger 
transactions. This solution 
would be simpler — and, with 
its revenues, more beneficial 
— than technical approaches 
such as ‘speed bumps’ that delay 
transactions. 

The non-equilibrium, 
complex systems that correspond 
to economies often operate at the 
threshold of instability. Adding 
taxes (‘friction’) should not be 
seen as creating inefficiency, but 
as a stabilizing influence that 
can avoid the costs of dramatic 
crashes. 

John Bechhoefer Simon Fraser 
University, Burnaby, Canada. 
johnb@sfu.ca 


Undergraduate 
research in action 


Our programmes at California 
State University address 

the challenges of bringing 
undergraduates into research 
labs (see Nature 518, 127-128; 
2015). The students are then 
better equipped for admission 

to the top professional training 
programmes in the United States 
and worldwide. 

More than 100 undergraduate 
research students are 
trained every year under our 
programmes, which have been 
running for 43 years and have 
garnered a US Presidential 
Award for Mentoring, among 
other honours. We aim to make 
students proficient in doing 
quality research experiments 
and in statistically analysing and 
publishing them. 

All new undergrads are 
trained by peer undergraduates 
(not graduate students or 
postdocs) experienced in the 
research, ensuring that the 
newcomers immediately feel 
comfortable in the research 
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setting; their work is regularly 
checked by senior staff. 

The burden of heavy course 
loads is mitigated by an open- 
lab policy that allows students to 
pursue their research out of hours 
and during university vacations. 

Our undergraduates have 
co-authored hundreds of 
publications and national 
presentations. And, to stimulate 
pre-college students’ interest in 
research, we established a journal 
of student research abstracts 
almost 20 years ago (now open 
access), and annual symposia 
for student research posters (see 
go.nature.com/dwoxéd). 

Steven B. Oppenheimer 
California State University, 
Northridge, California, USA. 
steven.oppenheimer@csun.edu 


Women’s grants lost 
in inequality ocean 


Denmark last year launched 

its YDUN programme, 

an experimental one-year 
government research-funding 
scheme specifically for women. 
It was branded as sexist and 
provoked a political squall, 

so is unlikely to be repeated. 
Our analysis indicates that 

the 110 million krone (US$16 
million) allocated to YDUN is 
roughly the same as the shortfall 
in Danish grant money won by 
women compared with men 
every year over the past 10 years. 

The proportion of successful 
grant applications in 2009-13 
to the Danish Council for 
Independent Research (DFF), 
which also ran YDUN, was 
roughly comparable for 
male and female researchers 
according to their own analysis 
(14% and 11%, respectively; 
see go.nature.com/uryhca (in 
Danish)). 

However, our analysis of 
DFF data since the council's 
foundation in 2005 revealed 
that this 3% difference in 
success rates is significant: it 
corresponds to a male advantage 
of an average of 104 million 
krone per year, comparable 
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to the entire YDUN funding 
allocation for women. 

YDUN was a welcome attempt 
to widen Denmark’s talent pool, 
but managed to level the playing 
field for only one year, and only 
for the DFE Even then, the 
success rate for YDUN was only 
3% (17 of 553 applicants). This 
level of competition is much 
higher than for DFF funding. 
Even though YDUN funding 
effectively made up the shortfall 
within the DFF for 2014, women 
still had to compete much harder 
to get it. 

Darach Watson, Jens Hjorth 
Niels Bohr Institute, University 
of Copenhagen, Denmark. 
darach@ dark-cosmology.dk 


Assessing resistance 
to new antibiotics 


Losee Ling and colleagues detect 
no bacterial resistance to the new 
antibiotic molecule teixobactin 
(L. L. Ling et al. Nature 517, 
455-459; 2015), but this could be 
because the conditions of their 
test may limit its sensitivity (see 

J. Ramsayer et al. Evol. Appl. 6, 
608-616; 2013). ‘Evolutionary 
rescue’ is a more powerful assay 
for evaluating the probability of 
resistance to novel antibiotics 

in large bacterial samples, and 
therefore for informing decisions 
about their usage. 

Evolutionary rescue assays 
can distinguish between 
resistant mutants that are 
present initially and those that 
emerge later (H. A. Orr and 
R. L. Unckless PLoS Genet. 10, 
e1004551; 2014). This type 
of assay can also be used to 
evaluate factors that contribute 
to the emergence of bacterial 
resistance, such as ‘horizontal’ 
gene transfer from other 
bacteria or the presence of 
bacterial ‘mutator’ strains with 
vastly increased mutation rates. 
Michael E. Hochberg Université 
Montpellier, France. 
mhochber@univ-montp2.fr 
Gunther Jansen Christian- 
Albrechts- Universitat, Kiel, 
Germany. 
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Yves Chauvin 


(1930-2015) 


Nobel- prizewinning chemist who rearranged carbon-carbon bonds. 


he impact of Yves Chauvin’s cl 
work across the chemi- | 


cal industries is mind- 
boggling. By dissecting how 
carbon-carbon bonds shift in reac- 
tions of petroleum compounds, 
Chauvin revealed the steps in one 
of organic chemistry’s most impor- 
tant reactions: metathesis. This 
Nobel-prizewinning work laid 
the path for chemical processes 
that are now used to make every- 
thing from pesticides to drugs. His 
proudest achievement, however, 
was developing crucial processes in 
the oil and plastics industries, now 
used to produce millions of tonnes 
of compounds each year. 

Chauvin, who died on 
27 January, was born in 1930 in 
Belgium, close to the border with 
France. His French parents sent 
him across the border daily to 
primary school; when his family 
returned to France, Chauvin fin- 
ished his education in Paris. His 
summers were spent ina large family house 
in Tours, in France’s Loire Valley, where he 
lived out the end of his life. 

After finishing his undergraduate degree 
in chemical engineering in 1954 at Lyon’s 
college of industrial chemistry (Ecole Supé- 
rieure de Chimie Industrielle de Lyon), he 
began working at the French chemical com- 
pany Progil (now part of Sanofi), where he 
met his wife, Huguette Labarre. 

Chauvin said that he regretted that mili- 
tary service and other circumstances kept 
him from pursuing a PhD. But he also felt 
that not having one freed his mind to con- 
sider a broad range of topics. He resigned 
from Progil after two years because 
managers demanded that he simply copy 
procedures without exploring ideas from 
other fields. 

In 1960, he moved to his scientific home 
for the next 40 years, the French Institute of 
Petroleum (IFP) near Paris. Here, Chauvin 
devoted himself to accelerating the pro- 
duction of chemicals by a process known as 
homogeneous catalysis. In this procedure, 
all components are dissolved in a solution, 
enabling fine control and the ability to work 
with large volumes of chemicals at relatively 
low temperatures. He bucked the trend in 
petrochemistry that then favoured catalysis 
on solid substrates, a technique that requires 


higher temperatures and often produces 
toxic by-products. 

The work led to the invention of pro- 
cesses that are now central to the petro- 
chemical industry. The dimersol and 
difasol processes pair smaller hydrocarbons 
to ‘octane boosters’ added to petrol or, in 
a modified version, to the starting mate- 
rial for plasticizers, additives that increase 
the plasticity or fluidity of a material. The 
alphabutol process combines carbon mol- 
ecules to make feedstocks and additives for 
everything from lubricants to plastics. 

Chauvin continued to develop homoge- 
neous catalysis. He solved a major prob- 
lem: the separation of the catalyst from 
the reaction medium, drawing on his 
knowledge of batteries to develop ionic 
liquids as new solvents. 

Although Chauvin’s name became syn- 
onymous with the process of homogeneous 
catalysis, he is best known for working out 
the steps of the intricate molecular dance 
known as olefin metathesis. Here, fragments 
of olefins — molecules containing double- 
bonded carbon atoms — swap places with 
each other, much as dancing couples swap 
partners. The genius of Chauvin was to 
co-opt ideas from a very different chemical 
process called ring-opening polymerization. 

In a simple but ground-breaking 
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| experiment, he reacted two types 
of olefin (cyclic and non-cyclic) 
to show that the resulting prod- 
ucts combined fragments of both. 
He deduced that the molecular 
swaps were not symmetrical, as 
was commonly assumed, but were 
orchestrated by the formation of a 
temporary hydrocarbon ring con- 
taining the metal catalyst. Other 
chemists, notably Robert Grubbs 
and Richard Schrock, with whom 
Chauvin shared the 2005 Nobel 
Prize in Chemistry, used this 
insight to improve and develop 
industrial reactions that under- 
pin much of ‘green’ chemistry — 
efficient industrial processes that 
produce little waste. 

When Chauvin retired from the 
IFP in 1995, he came to work in 
my surface organometallic chem- 
istry laboratory at the University 
of Lyon. He would regularly catch 
the 5a.m. train from his home in 
Tours to be at the bench by 8 a.m. 

Chauvin’s later collaborations adapted 
homogeneous catalysis to ionic liquids, 
which can be effectively applied to a vari- 
ety of reactions, and their products — used 
for many industrial purposes — are readily 
isolated from the catalyst. 

Yves’s scientific virtuosity was tempered 
with humility. He was reluctant to go to 
Stockholm in 2005 because he felt that his 
contribution was less than that of Grubbs 
and Schrock, who made metathesis reac- 
tions broadly practical. He balanced fun- 
damental and applied research, producing 
more than 100 papers and 130 patents. 

Yves was always young at heart. He 
never missed his 16-kilometre weekly 
hike or failed to read the weekly edition 
of Chemical Abstracts. His deep curiosity 
was equalled by a knowledge and intuition 
that made him a fantastic inventor. Yves 
is deeply missed, both as a friend and asa 
great scientist. m 


Jean-Marie Basset is director of the KAUST 
Catalysis Center at the King Abdullah 
University of Science and Technology 

in Thuwal, Saudi Arabia. He ran the 
laboratory at the University of Lyon in 
France where Yves Chauvin worked from 
1996 to 2009. 

e-mail: jeanmarie. basset@kaust.edu.sa 
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For News & Views online, go to 
nature.com/newsandviews 


PLANETARY SCIENCE 


Enceladus’ hot springs 


The detection of silicon-rich particles originating from Saturn’s moon Enceladus suggests that water-rock interactions are 
currently occurring inside it — the first evidence of ongoing hydrothermal activity beyond Earth. SEE LETTER P.207 


GABRIEL TOBIE 


any planetary bodies are thought to 
Mie produced hydrothermal activ- 

ity — interactions between water 
and rock — as a result of hot-water circula- 
tion during the early stages of the Solar System, 
but Earth was the only one known to be sus- 
taining such activity today. Then, in 2005, the 
Cassini spacecraft discovered eruptions of 
water vapour and ice emanating from long, 
warm fractures on the south pole of Saturn's 
moon Enceladus. The detection of salted, icy 
grains in Enceladus’ erupting plume' clearly 
pointed to an ocean environment below its 
icy crust and to the leaching of rocks by warm 
water, at least in the past. On page 207 of this 
issue, Hsu et al.’ report hints of presently active 
hydrothermal processes on Enceladus. 

This story began about a decade ago during 
Cassini’s approach to Saturn, when one of the 
spacecraft’s instruments detected tiny dust 
particles, called stream particles, escaping into 
interplanetary space from the Saturn system’. 
Analysis of these particles revealed that they 
were mostly nanometre-sized and rich in sili- 
con, in contrast to the ice-rich particles preva- 
lent in the Saturnian environment. The origin 
of these particles has remained enigmatic 
for years. 

Building on their earlier modelling work’, 
Hsu and colleagues simulated the dynamics 
of the particles’ ejection, tracking them back 
to their most probable source region: Saturn's 
E ring, a tenuous ring mostly made of small 
ice grains, extending between the orbits of the 
moons Mimas and Titan. Because Enceladus 
is the source of particles in the E ring’, it must 
also be the ultimate source of the silicon-rich 
stream particles, which were presumably once 
incorporated in icy grains. 

By analysing mass spectra of the stream 
particles, the authors concluded that the domi- 
nant constituent is silica (SiO,). This is much 
more probable than pure silicon or silicon 
carbide (SiC), two other potential candidates. 
Silica is extremely common on Earth, occur- 
ring mostly in the natural form of quartz. But 
finding silica nanoparticles in the Saturnian 
environment is unexpected. Hsu et al. ruled 
out fragmentation of larger grains as a possible 
process to explain the narrow size distribution 


Figure 1 | The Lost City hydrothermal field under the mid-Atlantic Ocean. These limestone 
chimneys, which are up to 60 metres tall, vent fluids at a temperature of 90°C. Hsu et al.” report evidence 
ofa similar aqueous environment in Saturn's icy moon Enceladus. 


of the stream particles. The composition and 
size distribution must therefore be inher- 
ited from the particles’ formation process, 
which seems most likely to have been fast 
crystallization of silica nanoparticles from 
supersaturated aqueous solutions. 

Using laboratory experiments, the authors 
finally showed that silica particles with the 
observed size distribution can be produced 
only under rather specific thermo-physical 
conditions, thus constraining the thermal state 
of Enceladus’ interior. Specifically, a region of 
the rock core must have a temperature of at 
least 90°C and be in contact with water of pH 
greater than 8.5 to dissolve silica in sufficient 
amounts; the oceanic salinity should be less 
than 4% and oceanic pH in the range of 8.5 
to 10.5, to allow the formation of numerous 
nanometre-sized silica grains. 

The inferred core temperature is unexpect- 
edly high for a body the size of Enceladus 
(approximately 500 kilometres in diameter), 
especially given that deep water circulation 
should efficiently cool the core’. A strong heat 
source must exist to raise the core tempera- 
ture above 90°C — most probably tidal fric- 
tion, and possibly also exothermic water-rock 
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reactions known as serpentinization reactions’. 
But modelling is needed to determine whether 
tidal flow and serpentinization in the core 
could provide sufficient energy at present to 
allow hydrothermal activity, and, if so, for 
how long. 

Intriguingly, the conditions inferred by 
Hsu and colleagues in Enceladus’ water-rock 
system are similar to those found on Earth 
in an atypical hydrothermal field called 
Lost City (Fig. 1), which was discovered in 
the early 2000s in the mid-Atlantic Ocean*”. 
This hydrothermal field consists of limestone 
chimneys 60 metres tall, which vent metal- 
poor, basic fluids (pH 10-11) at a tempera- 
ture of 90°C; the fluids are rich in hydrogen, 
abiotically produced methane and other 
organic compounds. For comparison, most 
other known fields are fuelled by acidic 
(pH 3-5), metal- and sulfide-rich fluids at 
temperatures greater than 300°C (ref. 8). 

Because it is relatively cold, Lost City has 
been posited’ as a potential analogue of hydro- 
thermal systems in active icy moons. The 
current findings confirm this. What is more, 
alkaline hydrothermal vents might have been 
the birthplace of the first living organisms on 
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the early Earth, and so the discovery of similar 
environments on Enceladus opens fresh per- 
spectives on the search for life elsewhere in the 
Solar System. 

Hsu et al. also conclude that the silica particles 
must be transported from the core hydro- 
thermal source to the plume source near the 
surface in a fairly short time — from months to 
years at most — to limit the particles’ growth. 
This implies that samples of materials erupted 
from Enceladus’ warm fissures would pro- 
vide a unique opportunity to directly probe 
aqueous, possibly prebiotic, processes occur- 
ring deep in Enceladus’ rock core, in almost 
real time. Cassini’s discoveries, together 


with Hsu and colleagues’ findings, point to 
potentially complex chemical processes in 
Enceladus’ watery interior. Cassini will fly 
through the moon’s plume again later this 
year, but only future missions that can under- 
take improved in situ investigations'”"', and 
possibly even return samples to Earth", will 
be able to confirm Enceladus’ astrobiologi- 
cal potential and fully reveal the secrets of its 
hot springs. m 
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Disarming Wnt 


The secreted enzyme Notum has been found to inhibit the Wnt signalling 
pathway through removal of a lipid that is linked to the Wnt protein and that is 
required for activation of Wnt receptor proteins. SEE ARTICLE P.187 


ROEL NUSSE 


ells signal to one other through secreted 

molecules that are conserved across the 

evolutionary spectrum. One class of 
these signals is Wnt proteins, which influence 
the balance between proliferation and differen- 
tiation in many cell types, including stem cells’. 
Because this balance is crucial for normal 
tissue maintenance, and overactivation of Wnt 
signalling can cause cancer, the activity of Wnt 
signals is tightly controlled by various extracel- 
lular molecules. In this issue, Kakugawa et al.’ 
(page 187) describe an unexpected mechanism 
by which Wnt signals can be downregulated, 
showing how an extracellular enzyme called 
Notum renders Wnt inactive. 

Detailed biochemical, structural and genetic 
experiments’ reveal that Wnt signalling mech- 
anisms are built from unusual elements. When 
Wnt proteins are made, an acyl group from 
palmitoleic acid (a monounsaturated form of 
the lipid palmitic acid) is attached at an evolu- 
tionarily conserved serine amino-acid residue, 
through a carboxyl ester link**. This modifica- 
tion is made by Porcupine, an enzyme located 
ina cellular substructure called the endoplas- 
mic reticulum’. Such palmitoleoylation is 
essential for Wnt activity, because it is the acyl 
group that binds to Frizzled® — the transmem- 
brane receptor protein for Wnt — through a 
hydrophobic cavity in the receptor on target 
cells. Wnt-Frizzled binding is imperative for 
receptor activation, and triggers many events 
in the cell, from modulating gene expression 
to changing cell shape. 

As with many components of the Wnt 
signalling pathway, Notum was originally 


discovered in fruit flies, in screens for genes 
that interact with the Wnt protein Wing- 
less”*. Loss of Notum in flies leads to abnor- 
mal wing growth, indicating that Wnt 
signalling (which drives wing growth and 
patterning) becomes unrestrained in its 
absence. Wnt signals also turn on the expres- 
sion of the gene that encodes Notum, leading 
to negative feedback regulation that intrinsi- 
cally limits signalling, as is often the case for 
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such pathways. Initial studies’”’ suggested 
that Notum might act as a phospholipase 
enzyme, cleaving the link between membrane- 
bound glycoproteins called GPI anchors and 
glypicans — large polysaccharides that form 
complexes with extracellular molecules such 
as Wnt. Cleavage releases glypicans into the 
extracellular space, decreasing their ability to 
restrain Wnt activity. 

Kakugawa et al.’ unveil a different, previously 
undocumented function of Notum. The 
authors start with a structural analysis of 
human and fly Notum, and find that the protein 
has the overall structure of a hydrolase enzyme. 
But it also has a large hydrophobic cavity of 
around 380 cubic angstréms, which in theory 
could provide sufficient space for binding by 
acyl groups with chains of up to 16 carbons — 
the length of palmitoleic acid. Furthermore, 
the researchers’ analysis of binding between 
Notum and acyl groups of various lengths and 
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Figure 1 | Notum shoots the messenger in Wnt signalling. In Wnt-producing cells, the Wnt protein is 
made ina cellular compartment called the endoplasmic reticulum. There, an acyl group from palmitoleic 
acid is added to Wnt by the membrane-spanning enzyme Porcupine. The Wntless protein then transports 
palmitoleoylated Wnt out of the cell. Secreted Wnt binds to its receptor protein Frizzled, which spans 

the membrane of Wnt target cells. This binding depends on the acyl group in Wnt, and triggers an 
intracellular signalling cascade. Kakugawa et al.’ report that the Wnt-Frizzled interaction is inhibited by 
the extracellular enzyme Notum, which specifically removes the acyl group from Wnt. 
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degrees of saturation reveals that, of the longer- 
chain molecules, only monounsaturated mol- 
ecules can bind. In other words, the acyl group 
found on Wnt can form a complex with Notum, 
whereas lipids with different configurations 
cannot. In parallel with their binding assays, 
Kakugawa and colleagues show that Notum 
enzymatically removes the acyl group from 
Wnt, thereby rendering the protein inactive. 
Such an extracellular deacylase activity has 
never been previously reported. 

Hedgehog is another signalling molecule 
whose activity is modified by lipids. But the 
authors demonstrate that, unlike Wnt, Hedge- 
hog is not a substrate for Notum. The specific- 
ity of Notum for monounsaturated acyl groups 
provides an explanation for this discrepancy, 
because the acyl group attached to Hedgehog 
contains saturated carbon bonds throughout. 
Kakugawa and co-workers also provide genetic 
evidence that, in flies, Notum does not interact 
with Hedgehog signalling in vivo. Finally, they 
show that Notum contains binding sites for 
polysaccharides such as glypican sugar chains, 
inviting speculation that glypicans bring 
together Notum and Wnt — thus modulating 
the enzymatic interaction of Notum with Wnt, 
rather than acting as a substrate for Notum to 
cleave GPI anchors. 

Kakugawa and colleagues’ discovery adds 
greatly to our understanding of Wnt signal- 
ling, and of the central role of the Wnt lipid 
group. The authors’ results demonstrate how 
acquisition or loss of the acyl group from pal- 
mitoleic acid can adroitly control the activation 
or deactivation of Wnt signals. The transmem- 
brane protein Wntless conveys Wnt molecules 
that have been palmitoleoylated by Porcupine 
through the cell for secretion’. Once secreted, 
Wnt proteins bind to Frizzled on other cells 
through the acyl group (Fig. 1). 

All of these lipid-related pathway compo- 
nents, including Notum, evolved at around 
the same time as Wnt. The Wnt protein itself 
contains a lipid-binding motif called a saposin 
fold, and it has been speculated" that, when 
Wnt signals initially evolved, they consisted of 
a lipid-protein complex, with the two becom- 
ing covalently linked at a later date. Lipids lie 
at the heart of Wnt signalling, and can even 
be viewed as a primordial cell-fate signal 
because they are also used by organisms such 
as choanoflagellates, which are located at the 
base of the animal evolutionary tree”. 

Because enzymes are often good targets for 
drugs, it might be possible to identify molecules 
that inhibit the activity of Notum, thereby 
increasing the strength of Wnt signalling. Wnt 
proteins can stimulate stem cells to proliferate, 
so such an approach could have therapeutic 
value for treating degenerative diseases. Col- 
lectively, these findings explain how Notum 
prevents tissues from growing abnormally or 
adopting aberrant identities — it shoots the 
messenger in the Wnt pathway by stripping 
Wnt proteins of their crucial lipid group. m 
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Fitness tracking for 
adapting populations 


A method for tracking the descendants of hundreds of thousands of yeast cells in 
an evolving population reveals that thousands of individuals contribute to early 
increases in population-wide fitness. SEE ARTICLE P.181 


DAVID GRESHAM 


ositive selection for genetic variants 
Pp that benefit an organism in a particular 

environment, a process called adaptive 
evolution, affects all species. As such, know- 
ing how frequently beneficial mutations occur, 
and quantifying the selective advantage they 
confer — their fitness — has been a long- 
standing goal for evolutionary biology’. On 
page 181 of this issue, Levy et al.” describe a 
method for tracking individual genetic vari- 
ants in an evolving population, and measuring 
their fitness and fate as the population adapts 
to the environment. 

Individuals descended from a common 
progenitor are said to be of the same genetic 
lineage. Levy and colleagues tracked indi- 
vidual lineages in yeast (Saccharomyces 
cerevisiae) with extremely high resolution by 
introducing hundreds of thousands of unique, 
random DNA sequences into individual yeast 
cells that have otherwise-identical genomes. 
These sequences, called barcodes, have no 
impact on the cell, but can be used to distin- 
guish between different individuals by means 
of DNA sequencing. Individuals that have the 
same barcode are part of the same lineage, 
allowing estimation of how many cells in the 
population are descended from a common 
ancestor. 

After barcoding the yeast cells, the authors 
studied the population as it underwent adap- 
tive evolution over many generations in a 
simple environment. In the evolving popu- 
lation, each daughter cell is born through 
cell division and so is a clone of its mother. 
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Thus, sexual reproduction plays no part in the 
population's evolutionary dynamics. Although 
all cells in the population start out with identi- 
cal genomes (apart from the barcodes), genetic 
diversity is introduced by random mutations 
that arise spontaneously when DNA is rep- 
licated during cell division. If a mutation is 
beneficial in the environment, allowing the cell 
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Figure 1 | Get fit or die trying. Levy et al.’ 
labelled hundreds of thousands of individual yeast 
cells, and tracked the population as it evolved. 
Each lineage is initially present in approximately 
equal numbers. In the first, predictable phase 

of evolution, thousands of lineages that acquire 
beneficial mutations (blue, red) expand, increasing 
the fitness of the population and leading to the 
decline of lineages that did not acquire beneficial 
mutations (black). In a second, less predictable 
phase of evolution, even lineages with beneficial 
mutations can decline (blue), as those containing 
mutations that confer an exceptionally high degree 
of fitness, and that arose early enough, continue 

to expand (red), further increasing population 
fitness. (Adapted from Fig. 1a of ref. 2.) 


and its descendants to proliferate more rapidly, 
that lineage will begin to increase in relative 
abundance in the population. By sequencing 
the cells’ molecular barcodes at different time 
points throughout the experiment, beneficial 
lineages can be identified. 

Levy and colleagues used their high- 
resolution lineage-tracking technique to 
quantify the fitness of each beneficial lineage, 
and to determine when the correspond- 
ing mutation occurred in the population’s 
history. They found that, in evolving yeast 
populations containing 70 million cells, about 
25,000 lineages showed fitness increases of 
more than 2% after just over 100 generations. 
Many of these lineages were present at frequen- 
cies lower than 0.001%. This means that there 
are initially many more competing lineages 
containing beneficial mutations in evolving 
populations than previously revealed by 
whole-population sequencing’. 

The aggregate effect of these thousands of 
beneficial lineages is to push the population 
fitness higher and higher. In doing so, a process 
of sequential purging occurs. First, the lineages 
that did not acquire a beneficial mutation 
are removed from the population. Then, as 
population fitness continues to increase, even 
lineages that contain beneficial mutations are 
purged once their individual fitness is less than 
that of the population as a whole. 

Levy and colleagues’ study shows that there 
are two distinct phases in the adaptive evolu- 
tion of a large cell population (Fig. 1). In the 
first phase, population fitness increases in a 
predictable manner. This increase is attribut- 
able to the cohort of thousands of different 
lineages with beneficial mutations, and 
depends on the size of the population and the 
fitness associated with each mutation. The 
second phase is less predictable. The ultimate 
‘winners’ must have higher fitness than the 
overall population and the mutations must 
have been introduced early enough in the 
population's history to establish themselves 
— this phase is unpredictable because such 
mutations are rare. 

The ability to quantify the fitness of each 
beneficial mutation in a population enabled 
Levy and co-workers to measure the range of 
fitnesses conferred by beneficial mutations. 
Theory predicts®” that the distribution of 
fitness effects associated with new mutations 
has a particular mathematical shape, known 
as an exponential distribution. However, the 
authors find that this is not the case, at least 
not in this environment. Instead, they observe 
a complicated distribution of fitness effects 
that seems to be composed of a mixture of 
distributions, which may reflect beneficial 
mutations in different genes. The nature 
of the distribution of fitness effects of ben- 
eficial mutations is central to understanding 
and simulating adaptive evolution in future 
experiments. As such, the ability to empiri- 
cally measure this distribution with precision 


provides opportunities to reconcile theory 
and data. 

Despite the power of Levy and co-workers 
technique, several limitations remain. First, 
the method does not actually identify the 
beneficial mutations, a key requirement for 
understanding the molecular basis of adapta- 
tion®*’. Second, it tells us about the distribu- 
tion of fitness effects for beneficial mutations, 
which are most relevant to the evolution of 
large asexual populations, but not those for 
neutral or deleterious mutations, which may be 
important in populations that are small, sexual 
or have high mutation rates. Last, and crucially, 
the method in its current form allows identi- 
fication of only the earliest stages of adaptive 
evolution. Once a single lineage has swept to 
high frequency in the population, its barcode 
will be abundant. Loss of barcode diversity 
limits the ability to detect a second beneficial 
mutation within these lineages, a problem that 
could be overcome by somehow regenerating 
the diversity of barcodes during the course of 
the experiment. 

The ability to track hundreds of thousands 
of individual lineages in a population is an 
exciting tool that allows us to address many 
questions in adaptive evolution. Levy et al. 
performed their experiment using enormous 
populations, ensuring an ample supply of 
mutations. However, studying the dynamics 
of adaptation in much smaller populations 
would also be informative, and will probably 
result in less-predictable outcomes in the early 
stages of adaptation. Furthermore, studying 
adaptation in different environments and dif- 
ferent genetic backgrounds will be crucial for 
assessing the generality of the results. Appli- 
cation of high-resolution lineage tracking in 
other organisms may be useful for understand- 
ing the evolutionary dynamics of antibiotic 
resistance in pathogens and the evolution of 
human tumours. The ability to observe evolu- 
tion in action with high resolution is certain to 
reveal unanticipated features of the universal 
force of adaptive evolution. = 
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50 Years Ago 


The completion of the Flora 

URSS is a scientific event of great 
significance not only to botanists 
of the Soviet Union ... During 

the War work was almost entirely 
suspended as most of the authors 
were evacuated from Leningrad. 
However, incredible efforts were 
made to continue the work. Thus, 
late in the autumn of 1941 in the 
besieged city ... an attempt was 
made to print Volume 11. 

B. A. Tikhomirov ... obtained 

the necessary amount of paper and 
... this volume was printed. 

N. E Goncharov, already desperately 
weakened by starvation, proceeded 
with the account of the genus 
Astragalus which made up Volume 12. 
Later that winter this account 

was defended as his thesis for the 
degree of doctor of biology, and 

in February 1942 Goncharov died 
of hunger ... Thus, thirty-three 
years of work and the participation 
of about a hundred authors were 
required for the completion of ... a 
Flora of 30 volumes. We remember 
all our colleagues, many of them 
long dead, who contributed to its 
achievement. We have done what 
we could. We welcome the young 
botanists and wish them success. 
From Nature 13 March 1965 


100 Years Ago 


Insects Injurious to the Household 
and Annoying to Man. By Prof. G. 

W. Herrick; The House-Fly, Musca 
domestica, Linn. Its Structure, Habits, 
Development, Relation to Disease and 
Control. By Dr. C. G. Hewitt — In 
addition to insects in the zoological 
sense of the term, spiders, mites, 
ticks, solpugids, scorpions, and 
centipedes are passed in review, and 
the British reader cannot but feel 
that some compensation for not 
being an American is afforded by the 
comparatively scanty house-fauna of 
his native land. 

From Nature 11 March 1915 
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How bacteria get 
Spacers from invaders 


Bacteria use CRISPR-Cas systems to develop immunity to viruses. Details of how 
these systems select viral DNA fragments and integrate them into bacterial DNA to 
create a memory of invaders have now been reported. SEE ARTICLES P.193 & P.199 


IDO YOSEF & UDI QIMRON 


memory was regarded as a feature 

unique to vertebrates — scientists ridi- 
culed the idea that bacteria might be able to 
‘remember’ viruses that attack them. Yet the 
almost inconceivable concept of bacterial 
immunological memory has since been 
shown to exist after all’*. Two papers in this 
issue, by Heler et al.* (page 199) and Nufiez 
et al.° (page 193), report major advances in our 
understanding of the molecular mechanism of 
this phenomenon. 

Bacteria remember their viral invaders by 
sampling short DNA sequences known as 
protospacers from the viruses’ genetic mater- 
ial. These sequences become integrated into 
the bacterium’s own DNA, specifically into an 
array of repeat sequences called clustered reg- 
ularly interspaced short palindromic repeats 
(CRISPRs; Fig. 1); the integrated sequences 
are called spacers. When a bacterium is sub- 
sequently attacked by a recognized virus, the 
spacers are transcribed from the array and 
used to guide a complex containing CRISPR- 
associated (Cas) proteins, which cleave proto- 
spacers in viral nucleic-acid molecules’. 

Accidental destruction of the CRISPR 
array could occur if transcribed spacers 
guide Cas proteins to cleave it, leading to 
catastrophic degradation of bacterial genetic 
material. To prevent this potential auto- 
immunity, some bacteria have CRISPR-Cas 
systems that cleave DNA targets only if they are 
flanked by sequences known as protospacer 
adjacent motifs (PAMs)°. The repeat sequences 
that flank spacers in such CRISPR arrays lack 
PAMs and therefore cannot be cleaved (Fig. 1). 

The mechanism by which spacers are chosen 
so that they target only PAM-associated 
protospacers has remained elusive. Heler and 
colleagues show that the Cas9 protein in two 
species of Streptococcus bacterium selects for 
spacers that have the correct PAM. Before now, 
the protein’s main known role was cleaving 
targeted DNA. 

When the authors exchanged Cas9 proteins 
for others that had different PAM specificities, 
they found that the PAM sequence of the 
acquired spacers changed accordingly. These 
CRISPR-Cas systems therefore efficiently use 


| ess than a decade ago, immunological 


Cas9’s ability to recognize PAM sequences 
for memory as well as for cleavage, instead of 
having dedicated memorizing proteins develop 
PAM recognition from scratch. In other types 
of CRISPR-Cas system, such as that found in 
Escherichia coli, the memorizing proteins have 
an intrinsic ability to select, at least partially, 
for PAM-encoding protospacers through an 
as-yet-unknown mechanism’. 

Heler and co-workers went on to show that 
Cas9 is required not only for determining the 
PAM sequence of the acquired spacers, but 
also for integrating spacers into CRISPRs. 
This feature is peculiar to Cas9 — the proteins 
that cleave nucleic acids in other CRISPR-Cas 
systems studied are not required for inte- 
gration’, but may enhance it under certain 
conditions’. 

These findings add crucial details to the 
mechanism of molecular memorization 
revealed by in vivo studies*’ °. Each spacer 
is integrated into the CRISPR array with a 
new repeat; it is known that newly integrated 
repeats maintain the sequence of the existing 


repeat on the other side of the new spacer’, 
and that the integration of new spacers prob- 
ably occurs by separation of the two DNA 
strands of this repeat’. But more information 
is needed, and an in vitro system that allows 
further mechanistic details to be uncovered 
has long been awaited. 

Nunez and colleagues have established 
just such a system: it is composed of E. coli 
memorizing proteins, a supercoiled plasmid 
DNA as the spacer-acceptor molecule and 
a double-stranded (ds) DNA that serves as a 
spacer-donor molecule. The researchers first 
demonstrated the validity of their system by 
using it to corroborate many of the in vivo 
characteristics of the CRISPR memorization 
process. They went on to analyse high-through- 
put sequencing of spacers inserted in vitro, and 
show that the memorizing proteins integrate 
spacers in the correct orientation by recogniz- 
ing a specific nucleotide base in the PAM. 

Importantly, their system allows the spacer 
donor to be easily replaced with DNA that 
has different sequences, end modifications 
and strand compositions, and thus enables 
the influence of these features on spacer inte- 
gration to be studied. In this way, Nuiez et al. 
show that hydroxyl (OH) groups at the 3’ ends 
of dsDNA substrates are essential for integra- 
tion. On the basis of this requirement, and of 
characterization of intermediates identified 
in vivo’, the authors propose a highly plausi- 
ble model for spacer insertion. In this model, 
the memorizing enzymes catalyse bond forma- 
tion between the 3’ end ofa preferred strand 
of the spacer and a particular strand at the 
end ofa repeat. This is followed by the forma- 
tion of another bond between the 3’ end of 
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Figure 1 | The bacterial immune response. a, When bacteria are invaded by viral DNA, memorizing 


protein complexes of the CRISPR-Cas immune system select short sequences (protospacers) of the 
foreign DNA and integrate them as spacer sequences into their own chromosome. The spacers are 
integrated into an array of repeat sequences called clustered regularly interspaced short palindromic 
repeats (CRISPRs); different spacers are shown in orange, pink and red. b, If the bacterium is 
subsequently exposed to previously encountered DNA, a transcript of the spacer guides a cleavage 
protein complex to cut out the protospacer. In some types of CRISPR-Cas system, cleavage occurs 
only if the protospacer is flanked by a protospacer adjacent motif (PAM), and the CRISPR array is not 
cleaved because the spacers lack PAMs. Heler et al.* and Nufiez et al.’ report details of the molecular 
mechanisms by which CRISPR-Cas systems select and integrate spacers into bacterial DNA. 
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the complementary strand of the spacer and the 
complementary strand at the other end of 
the repeat (see Fig. 5 of the paper). 

The strength of in vitro approaches to study- 
ing biological systems is that all the compo- 
nents are artificially added to the reaction; the 
requirements and features of each component 
can therefore be defined and manipulated. But 
differences from physiological activity may 
occur, stemming either from the use of a dif- 
ferent chemical environment from that found 
in vivo or from the absence of regulatory ele- 
ments. Such elements may not be essential for 
the generation of the end product ofa reaction, 
but might have a key role in the physiological 
process. 

Nunez and co-workers report just such a 
difference. In vivo studies have revealed that 
spacer integration occurs predominantly at 
the first repeat of the CRISPR array””. By con- 
trast, the authors observe that spacer insertion 
in their in vitro system is also distributed near 
other repeats, and even outside the CRISPR 
array. The researchers suggest that this might 
represent a physiological way of generating 
newarrays. This is a valid possibility, but regu- 
latory elements in vivo or in physiological con- 
ditions probably often restrict this distribution 
and direct integration in a specific location. 

The authors also report that PAM-encoding 
spacer donors are not preferred substrates for 
integration in vitro, as opposed to what has 
been seen in vivo’. Moreover, they observe 
that the length of integrated spacers may vary 
substantially, whereas spacers in naturally 
occurring arrays have a strictly defined length. 
These differences might be explained by the 
fact that the in vitro system simulates only the 
last stage of spacer integration; earlier steps in 
the natural process probably account for the 
PAM preference and for defined spacer lengths 
observed in vivo. 

The differences in the in vivo and in vitro 
studies nevertheless highlight the cardinal 
question of what determines the constant 
length of newly acquired spacers in vivo. Is it 
dictated by a protein complex that hands the 
processed spacer to the memorizing enzymes? 
If so, then what are these proteins? An in vitro 
system composed of all of the elements that 
catalyse every step of the reaction is needed to 
address these issues. 

PAMs prevent autoimmunity against the 
CRISPR array, but autoimmunity could also 
occur if the CRISPR-Cas system acciden- 
tally cleaves other DNA sequences. So how 
is this prevented? It is known*”” that foreign 
DNA molecules are sampled by CRISPR- 
Cas systems more frequently than the host’s 
chromosome. Future work should investi- 
gate the mechanism underlying this selective 
sampling. = 
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Black carbon and 
atmospheric feedbacks 


Climate simulations show that interactions between particles of black carbon 
and convective and cloud processes in the atmosphere must be considered when 
assessing the full climatic effects of these light- absorbing particulates. 


BEN BOOTH & NICOLAS BELLOUIN 


lack carbon’, often referred to as soot, is 
B emitted during the incomplete combus- 
tion of fossil fuels, biofuels or wood. In 
contrast to other particulates emitted into the 


atmosphere by human activities, black carbon 
absorbs sunlight efficiently. This absorption 


Blackscarbon 
sources 


leads to local heating of the atmosphere, 
warming the planet. Black carbon has received 
particular interest recently’ in the context of 
changes in climate policy. It remains in the 
atmosphere for only a few days, so cutting 
black-carbon emissions may be a viable way 
to reduce global warming over the next few 
decades, alongside measures to mitigate 


Figure 1 | A typical black-carbon feedback loop. a, Black-carbon emissions from sources such as 
industrial processes, brick kilns and forest fires have numerous influences on the atmosphere. The 
details of these influences can be strongly dependent on time and location, but the emissions generally 
lead to a net surface dimming (thin yellow arrows) alongside enhanced warming at height (pink shading). 
The latter factor is often associated with enhanced vertical convection and effects on clouds that have 

a net result, according to Sand and colleagues’ climate simulations’, of reducing cloud at altitude 

(not shown). b, Sand et al. suggest that roughly half of the total climate impact of black-carbon emissions 
is apparent only if these atmospheric responses feed back to the distribution of black carbon, lofting it 

to height and increasing its atmospheric lifetime and spatial extent. These changes enhance any surface 
dimming (extension of thin yellow arrow), and lead to greater warming at height (upper pink shading), 
black-carbon transport into the upper atmosphere (dotted black arrow) and further changes to clouds. 
Although, locally, greater warming can generate some deepened convective clouds, the net impact is a 


further reduction of cloud at altitude. 
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changes in the emissions of carbon dioxide. 
Such actions need to be supported by a good 
understanding and prediction of the climate 
role of black carbon. Writing in the Journal 
of Climate, Sand et al? report climate simula- 
tions that provide insights into these issues. 
The results highlight challenges for upcoming 
international initiatives aimed at better under- 
standing how the climate responds to changes 
in the composition of the atmosphere. 

Sand and colleagues used a numerical 
simulator of Earth’s atmosphere and ocean to 
compare the effects on climate of artificially 
large increases in the emission of carbon 
dioxide and black carbon. They find that, 
although the increases were designed to 
exert similar perturbations in Earth’s energy 
budget (the net flow of incoming and outgoing 
energy), changes in the planet’s surface tem- 
perature and rainfall are considerably weaker 
in the simulation with elevated concentrations 
of black carbon. 

This result confirms the importance of 
rapid responses in the atmosphere to changes 
in black carbon. These responses manifest 
themselves as warming at height and changes 
in cloud properties that lead to a net decrease 
in mid- and high-level cloud (Fig. 1). More- 
over, they act to offset the initial artificially 
large perturbation, mainly because the warm- 
ing and cloud loss at altitude effectively radiate 
energy to space, before the surface climate is 
able to respond. However, the magnitude of 
the rapid responses reported by Sand et al. — 
roughly seven times stronger than those to 
carbon dioxide — will come as a surprise to 
many climate scientists. 

The researchers also highlight another 
result, which has implications for numerical 
simulations of climate change. By using a pair 
of experiments, both of which explore the cli- 
mate impacts of black carbon and differ only 
in whether black-carbon changes can also 
adjust to atmospheric-circulation responses, 
Sand et al. demonstrate the role of the two-way 
black carbon-atmosphere interactions in driv- 
ing the full climate response. Their findings are 
unexpected because these interactions seem to 
be the dominant cause of the climate response 
to changes in black carbon. The change in 
global surface temperature varies by a factor 
of two between the two experiments, with con- 
siderably larger differences at altitude. Indeed, 
many rainfall responses appear only when 
feedbacks of black carbon-to-atmosphere-to- 
black carbon are included. 

The authors point out that the feedback loop 
of black carbon to itself through changes in cli- 
mate may be particularly strong in their simula- 
tions because their model contains an unusually 
active atmospheric convection. Moreover, this 
strength may be exacerbated further by the 
artificially large perturbation imposed. Experi- 
ments with other numerical models may find 
weaker responses. Nevertheless, the large dif- 
ferences in the climate impacts of black carbon, 


when its two-way interaction with meteorology 
is also included, may make it harder to deter- 
mine black carbon’ full climate impact. 

The various groups of climate scientists each 
focus on specific aspects of the climate system 
to better understand the effects of atmospheric 
changes. For those who work on atmospheric 
particulates, such as black carbon, an impor- 
tant aim is to quantify the particulates’ impact 
on Earth’s energy budget. A largely separate 
community studies atmospheric feedbacks, 
such as convection and clouds. Plans are 
already under way to design climate-model 
experiments under the Coupled Model Inter- 
comparison Project, Phase 6 (ref. 4), which will 
provide improved knowledge of future climate 
responses and feed results to the next assess- 
ment report of the Intergovernmental Panel on 
Climate Change. Contributions to several of 
these experiments will either prescribe a fixed 
meteorology to explore the impacts on Earth’s 
energy balance, or use fixed concentrations of 
atmospheric particulates to explore atmos- 
pheric feedbacks. 

Such a pragmatic approach enables groups 
to concentrate resources on particular aspects 


EVOLUTIONARY BIOLOGY 


of the climate-change problem and gain better 
insight into the processes involved. However, 
Sand and co-workers’ findings suggest that 
when it comes to understanding the full climate 
impact of black carbon, it will be crucial to 
account for both how black carbon influences 
atmospheric circulation and also how these 
changes feed back on the atmospheric distri- 
bution of black carbon. This highlights the risk 
of simplified or idealized approaches, which 
may produce misleading conclusions about 
the total climate impact of changes in black- 
carbon concentrations. m 
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The origin of 
terrestrial hearing 


A study of the African lungfish reveals that it has a rudimentary ability to detect 
pressure waves caused by sound. The finding expands our knowledge of how 
hearing evolved in early tetrapods, the first vertebrates to have limbs and digits. 


JENNIFER A. CLACK 


long-standing problem in the evolution 
A: land vertebrates has been how they 
evolved to detect sound. Lungfishes 
are the closest living relatives of tetrapods 
(vertebrates that have limbs and digits), and so 
may help to provide an answer. Until recently, 
however, there have been few investigations 
into lungfish hearing. Writing in the Journal 
of Experimental Biology, Christensen et al.' 
report their findings about whether the Afri- 
can lungfish Protopterus annectens can detect 
sound, casting fresh light on our understand- 
ing of the hearing capabilities of the earliest 
tetrapods. 
The earliest tetrapods seem not to have had 
a specialized apparatus that would enable ter- 
restrial hearing’, so to what extent could they 
pick up air-borne sound as they came onto 
land? Although there have been many stud- 
ies of the hearing capacities of ray-finned 
fishes (actinopterygians), how relevant these 
findings are to early tetrapods has remained 
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unclear, because ray-finned fishes are a 
separate branch of bony fishes (osteichthyans) 
from the lobe-finned fishes (sarcopterygians), 
the group to which tetrapods and lungfishes 
belong. 

Lungfishes have no obvious adaptations for 
hearing — that is, they have no middle-ear 
cavity or a bone equivalent to the stapes bone 
in tetrapods, through which sound could 
be conveyed to the inner ear. However, they 
do have paired lungs, and Christensen and 
co-workers find that these are key to enabling 
the lungfish to detect sound. 

Overcoming the obstacles to investigating 
sound detection by fishes is not simple. Ideally, 
the experiments should be done in open water 
to avoid the influence of the experimental set- 
up on the characteristics of the sound field. 
However, a sound field that can reasonably 
be used to examine the different aspects of 
fish hearing** can be established by creating a 
standing wave in a metal tube. 

By placing lungfish at different locations 
within the tube, Christensen and colleagues 


a Eusthenopteron 


Otic capsule Gill pouch 


Hyomandibula 


Figure 1 | Evolution of the hearing apparatus in tetrapods. a, In the 
extinct lobe-finned fish Eusthenopteron, a bone called the hyomandibula 
is associated with an air-filled gill pouch and articulates with the otic 
capsule (the ear region) to control movements of the lower jaw and other 
parts of the head and throat. b, Acanthostega is intermediate between 
lobe-finned fishes and the first tetrapods that were fully capable of coming 
onto land. A bone called the stapes (formed in part from a reduced-size 


calibrated both pressure and particle motion 
in the set-up, to establish which of these 
components of the sound wave the fish were 
responding to. They show that the lungfish 
responds more strongly to the pressure gen- 
erated by the sound than to particle motion. 
More specifically, it uses its air-filled lungs to 
convert pressure to particle motion in its lung 
that is then perceived by the inner ear. This is 
similar to the way in which ray-finned fishes 
use their swim bladder’ (an internal gas-filled 
organ that allows a fish to control its buoyancy) 
for sound detection. 

The researchers went on to show that lung- 
fish can detect sound pressure waves propa- 
gated either through water or through the 
substrate (the material at the bottom ofa lake 
or stream) and might even have a rudimentary 
capability to detect such waves in air, despite 
the absence of a direct anatomical connec- 
tion between the lung and the inner ear. The 
groups’ earlier work’ had suggested that lung- 
fish were unlikely to be able to detect pressure 
waves, but Christensen et al. obtained more 
positive results by using a modified version of 
the previously reported experimental set-up. 

Their findings might have been predicted 
in the light of what is known about ray-finned 
fishes. But confirmation was necessary, and 
has major implications for the evolution of 
hearing in the earliest tetrapods. It suggests 
that, iflungfish are capable of sound detection 
without any obvious connection between an 
air bladder and the inner ear, then the presence 
of any such connection — even one not obvi- 
ously adapted for hearing — would have made 
sound detection possible. 

Both ray-finned and lobe-finned fishes are 
thought to have possessed air bladders early 
in their evolution, and may have used them in 
addition to gills for breathing. The swim blad- 
der of ray-finned fishes is widely considered 
to have a common evolutionary origin with 
the lungs in lungfishes and tetrapods. All of 
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the early bony fishes were also equipped with 
a bone called a hyomandibula that articulated 
with the ear capsule at a mobile joint and con- 
trolled ventilatory movements. It operated the 
pumping action of the gill chamber, throat and 
the buccal cavity (the mouth), drawing water 
or air into these spaces. Air could also pass into 
the air bladder by this mechanism. Lungfishes, 
however, lost the hyomandibula during their 
evolution, although they still breathe air using 
a similar, but elaborated, buccal-pumping 
mechanism. 

It therefore seems that, even in the earliest 
osteichthyans, the proximity of the mobile 
hyomandibula to an air-filled chamber could 
have allowed pressure-induced vibrations to 
be transmitted to the inner ear. Ifair breathing 
was a primitive osteichthyan characteristic, 
these animals could, from the time of their 
origin, have detected sound propagated in 
water, through the substrate, or possibly even 
in air, and may have done it rather better than 
modern lungfishes. 

Christensen and colleagues’ discovery 
makes sense of what is known about the 
earliest tetrapods from the Late Devonian and 
Early Carboniferous periods (which together 
spanned from about 387 million to 323 mil- 
lion years ago). Two of the most obvious dif- 
ferences between the ear regions of early bony 
‘fish’ and the descendent early ‘tetrapods’ are 
that, in the tetrapods, the hyomandibula had 
become modified into the stapes, which pen- 
etrated the braincase wall at an opening called 
the fenestra vestibuli (Fig. 1); and that the 
stapes had developed a structure called a sta- 
pedial footplate’. These changes must have 
marked at least some improvement in the 
transmission of sound waves to the inner 
ear, even though the stapes was not at that 
time a slender rod-like bone as it is in most 
land-dwelling tetrapods today. Rather, it was 
a bulky bone that was both relatively and abso- 
lutely much larger than in modern tetrapods” 
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¢ Lungfish 


Otic capsule 


hyomandibula) penetrates the otic capsule through an opening called the 
fenestra vestibuli. The stapes could transmit vibrations emanating from 
sound-induced pressure changes in the air-filled gill pouch to the otic 
capsule. c, Christensen et al.' report that the modern African lungfish, 
Protopterus annectens, uses its lung to transmit sound vibrations to the otic 
capsule, and provides a model for hearing in early tetrapods. (a, b adapted 
from ref. 10; c adapted from ref. 11.) 


that have an eardrum and a middle-ear cavity, 
but was nonetheless capable of transmitting 
vibrations emanating from pressure changes 
in the air-filled gill pouch with which it was 
in contact. 

The new discovery may also help to resolve 
an anomaly. One genus of Devonian tetrapod, 
Ichthyostega, had an ear region configured 
unlike that of any other known tetrapod. It 
seems to have had an air-filled chamber on each 
side of its head, roofed by thick walls formed 
by the skull, braincase and palate, but with a 
floor occupied in part by a thin, spoon-shaped 
stapes that articulated with the braincase and 
fenestra vestibuli®. This has been interpreted 
as an ear adapted for underwater hearing, yet 
other parts of Ichthyostega’s anatomy suggest 
that the animal had some adaptations for land 
locomotion’. We can now interpret the struc- 
ture as an ear capable of hearing in both aquatic 
and terrestrial conditions. = 
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Defining the Anthropocene 


Simon L. Lewis? & Mark A. Maslin! 


Time is divided by geologists according to marked shifts in Earth’s state. Recent global environmental changes suggest 
that Earth may have entered anew human-dominated geological epoch, the Anthropocene. Here we review the historical 
genesis of the idea and assess anthropogenic signatures in the geological record against the formal requirements for the 
recognition of a new epoch. The evidence suggests that of the various proposed dates two do appear to conform to the 
criteria to mark the beginning of the Anthropocene: 1610 and 1964. The formal establishment of an Anthropocene Epoch 
would mark a fundamental change in the relationship between humans and the Earth system. 


influence on the global environment. The magnitude, variety 
and longevity of human-induced changes, including land sur- 
face transformation and changing the composition of the atmosphere, 
has led to the suggestion that we should refer to the present, not as within 
the Holocene Epoch (as it is currently formally referred to), but instead 
as within the Anthropocene Epoch'™ (Fig. 1). Academic and popular 
usage of the term has rapidly escalated** following two influential papers 
published just over a decade ago’”. Three scientific journals focusing on 
the topic have launched: The Anthropocene, The Anthropocene Review 
and Elementa. The case for a new epoch appears reasonable: what matters 
when dividing geological-scale time is global-scale changes to Earth’s status, 
driven by causes as varied as meteor strikes, the movement of continents 
and sustained volcanic eruptions. Human activity is now global and is the 
dominant cause of most contemporary environmental change. The impacts 
of human activity will probably be observable in the geological stratigraphic 
record for millions of years into the future’, which suggests that a new 
epoch has begun’. 
Nevertheless, some question the types of evidence*”, because to define 
a geological time unit, formal criteria must be met'*'!. Global-scale changes 
must be recorded in geological stratigraphic material, such as rock, glacier 
ice or marine sediments (see Box 1). At present, there is no formal agreement 


H uman activity has been a geologically recent, yet profound, 
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on when the Anthropocene began, with proposed dates ranging from 
before the end of the last glaciation to the 1960s. Such different meanings 
may lead to misunderstandings and confusion across several disciplines. 
Furthermore, unlike other geological time unit designations, definitions 
will probably have effects beyond geology. For example, defining an early 
start date may, in political terms, ‘normalize’ global environmental change. 
Meanwhile, agreeing a later start date related to the Industrial Revolution 
may, for example, be used to assign historical responsibility for carbon 
dioxide emissions to particular countries or regions during the industrial 
era. More broadly, the formal definition of the Anthropocene makes 
scientists arbiters, to an extent, of the human-environment relationship, 
itself an act with consequences beyond geology. Hence, there is more 
interest in the Anthropocene than other epoch definitions. Nevertheless, 
evidence will define whether the geological community formally ratifies 
a human-activity-induced geological time unit. 

We therefore review human geology in four parts. First, we summar- 
ize the geologically important human-induced environmental impacts. 
Second, we review the history of naming the epoch that modern human 
societies live within, to provide insights into contemporary Anthropocene- 
related debates. Third, we assess environmental changes caused by human 
activity that may have left global geological markers consistent with the 
formal criteria that define geological epochs. Fourth, we highlight the 


Figure 1 | Comparison of the 
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BOX | 
Dividing geological time 


Geological time is divided into a hierarchical series of ever-finer units 
(Fig. 1a). The present, according to The Geologic Time Scale 2012°°, is 
in the Holocene Epoch (Greek for ‘entirely recent’; started 

11,650 yrBp), within the Quaternary Period (started 2.588 million 
years ago), within the Cenozoic Era (‘recent life’; started 66 million 
years ago) of the Phanerozoic Eon (‘revealed life’; started 541 million 
years ago). Divisions represent differences in the functioning of Earth 
as a system and the concomitant changes in the resident life-forms. 
Larger differences result in classifications at higher unit-levels. 

Formally, geological time units are defined by their lower boundary, 
that is, their beginning. Boundaries are demarcated using a GSSP., or if 
good candidate GSSPs do not exist, by an agreed date, termed a 
GSSA?°. Fora GSSP, a ‘stratotype section’ refers to a portion of material 
that develops over time (rock, sediment, glacier ice), and ‘point’ refers 
to the location of the marker within the stratotype. Each ‘golden 
spike’ is a single physical manifestation of a change recorded ina 
stratigraphic section, often reflecting a global-change phenomenon. 
GSSP markers are then complemented by a series of correlated 
changes, also recorded stratigraphically, termed auxiliary stratotypes, 
indicating widespread changes to the Earth system occurring at 
that time?°. An exemplary GSSP is the Cretaceous—Paleogene period- 
level boundary, and the start of the Cenozoic Era, when non-avian 
dinosaurs declined to extinction and mammals radically increased in 
variety and abundance. The GSSP boundary marker is the peak in 
iridium—a residual of bolide impact with Earth—in rock dated at 
66 million years ago, located at El Kef, Tunisia’®. 

The widespread appearance of new species can also be used as 
GSSP boundary markers; for example, the Ordovician-Silurian period- 
level boundary, 443.8 million years ago, is marked by the appearance 
of a distinct planktonic graptolite, Akidograptus ascensus (a now-extinct 
hemichordate)’°. From an Anthropocene perspective this example 
shows that the GSSP primary marker chosen as a boundary indicator 
may be of limited importance compared to the other events taking 
place that collectively show major changes to Earth at that time®’. 

Formally, a GSSP must have (1) a principal correlation event 
(the marker), (2) other secondary markers (auxiliary stratotypes), 

(3) demonstrated regional and global correlation, (4) complete 
continuous sedimentation with adequate thickness above and below 
the marker, (5) an exact location—latitude, longitude and height/ 
depth—because a GSSP can be located at only one place on Earth, 
(6) be accessible, and (7) have provisions for GSSP conservation and 
protection?®. 

Alternatively, following a survey of the stratigraphic evidence, a 
GSSA date may be agreed by committee to marka time unit boundary. 
GSSAs are typical in the Precambrian (>541 million years ago) 
because well-defined geological markers and clear events are less 
obvious further back in time?®. Regardless of the marker type, formally 
ratifying a new Anthropocene Epoch into the GTS would first require a 
positive recommendation from the Anthropocene Working Group of 
the Subcommission of Quaternary Stratigraphy, followed by a 
supermajority vote of the International Commission on Stratigraphy, 
and finally ratification by the International Union of Geological 
Sciences’? (see ref. 11 for full details). 


advantages and disadvantages of the few global markers that may indicate 
a date to define the beginning of the Anthropocene. By consolidating 
research from disparate fields and the emerging Anthropocene-specific 
literature we aim to constrain the number of possible Anthropocene start 
dates, highlight areas requiring further research, and assist in moving 
towards an evidence-based decision on the possible ratification of a new 
Anthropocene Epoch. 
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The geological importance of human actions 


Human activity profoundly affects the environment, from Earth’s major 
biogeochemical cycles to the evolution of life. For example, the early- 
twentieth-century invention of the Haber-Bosch process, which allows 
the conversion of atmospheric nitrogen to ammonia for use as fertilizer, 
has altered the global nitrogen cycle so fundamentally that the nearest 
suggested geological comparison refers to events about 2.5 billion years 
ago’*. Human actions have released 555 petagrams of carbon (where 
1 Pg = 10’ g = 1 billion metrictons) to the atmosphere since 1750, increas- 
ing atmospheric CO, to a level not seen for at least 800,000 years, and 
possibly several million years'*"’, thereby delaying Earth’s next glaciation 
event'’. The released carbon has increased ocean water acidity at a rate 
probably not exceeded in the last 300 million years’®. 

Human action also affects non-human life. Global net primary pro- 
ductivity appears to be relatively constant’’; however, the appropriation 
of 25-38% of net primary productivity for human use'”'* reduces the 
amount available for millions of other species on Earth. This land-use 
conversion to produce food, fuel, fibre and fodder, combined with tar- 
geted hunting and harvesting, has resulted in species extinctions some 
100 to 1,000 times higher than background rates’’, and probably con- 
stitutes the beginning of the sixth mass extinction in Earth’s history’”. 
Species removals are non-random, with greater losses of large-bodied 
species from both the land and the oceans. Organisms have been trans- 
ported around the world, including crops, domesticated animals and 
pathogens on land. Similarly, boats have transferred organisms among 
once-disconnected oceans. Such movement has led to a small number of 
extraordinarily common species, new hybrid species”, anda global homo- 
genization of Earth’s biota. Ostensibly, this change is unique since Pangaea 
separated about 200 million years ago", but such trans-oceanic exchanges 
probably have no geological analogue. 

Furthermore, human actions may well constitute Earth’s most import- 
ant evolutionary pressure””’. The development of diverse products, includ- 
ing antibiotics”, pesticides”, and novel genetically engineered organisms”, 
alongside the movement of species to new habitats”, intense harvesting” 
and the selective pressure of higher air temperatures resulting from green- 
house gas emissions, are all likely to alter evolutionary outcomes” ”*. Con- 
sidered collectively, there is no geological analogue”. Furthermore, given 
that the lifespan of a species is typically 1-10 million years, the rates of 
anthropogenic environmental change in the near future may exceed the 
rates of change encountered by many species in their evolutionary history. 
Human activity has clearly altered the land surface, oceans and atmosphere, 
and re-ordered life on Earth. 


Historical human geology 


Human-related geological time units have a long history*®. In 1778 
Buffon published an early attempt to describe Earth’s history, allocating 
a human epoch to be Earth’s seventh and final epoch, paralleling the 
seven-day creation story’’. By the nineteenth century, divine interven- 
tion was receding from consideration as a geological force. In 1854 the 
Welsh geologist and professor of theology, Thomas Jenkyn, appears to 
have first published the idea of an explicitly evidence-based human 
geological time unit in a series of widely disseminated geology lessons”. 
He describes the then present day as “the human epoch” based on the 
likely future fossil record”. In his final lecture he wrote, “All the recent 
rocks, called in our last lesson Post-Pleistocene, might have been called 
Anthropozoic, that is, human-life rocks.”’’. Similarly, the Reverend 
Haughton’s 1865 Manual of Geology describes the Anthropozoic as 
the “epoch in which we live”*', as did the Italian priest and geologist 
Antonio Stoppani a decade later”. Meanwhile in the USA, the geology 
professor James Dwight Dana’s then-popular 1863 Manual of Geology’ 
extensively refers to the “Age of Mind and Era of Man” as the youngest 
geological time, as did many of his US contemporaries™. 

In 1830 Charles Lyell had proposed that contemporary time be termed 
the Recent epoch” on the basis of three considerations: the end of the last 
glaciation, the then-believed coincident emergence of humans, and the 
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rise of civilizations””*. In the 1860s, the French geologist Paul Gervais 
made Lyell’s term international, coining the term Holocene, derived from 
the Greek for ‘entirely recent’. Thus, most nineteenth-century geological 
textbooks feature humans as part of the definition of the most recent 
geological time units. Critically, there was little discussion about any of 
these terms—Recent, Holocene or Anthropozoic—probably because each 
represented the same conceptual model and broad agreement that humans 
were part of the definition of the contemporary geological epoch. However, 
the wider written records of these often deeply religious men show that a 
separate human epoch was likely to have been more strongly influenced 
by theological concerns—in particular, separating Homo sapiens from 
other animals and retaining humans at the apex of life on Earth—than by 
the appraisal of stratigraphic evidence. 

In the twentieth century, geologists in the West increasingly used the term 
Holocene for the current epoch, and Quaternary for the period. Meanwhile, 
in 1922 the Russian geologist Aleksei Pavlov described the present day 
as part of an “Anthropogenic system (period) or Anthropocene’”**. The 
Ukrainian geochemist Vladimir Vernadsky then brought to widespread 
attention the idea that the biosphere, combined with human cognition, 
had created the Nodsphere (from the Greek for mind), with humans 
becoming a geological force*’. The term Nodésphere was not well used, 
but non-Western scientists often used anthropogenic geological time units. 
The Russian term was anglicized as both Anthropogene and Anthropocene”, 
sometimes creating confusion. The East-West differences in usage may 
have been due to differing political ideologies: an orthodox Marxist view 
of the inevitability of global collective human agency transforming the 
world politically and economically requires only a modest conceptual leap 
to collective human agency as a driver of environmental transformation. 
Again there was little broad interest in the various terms. The Holocene 
became the official term within the Geologic Time Scale (GTS; Fig. 1)’?**, 
with its implication that the current interglacial differs from the previous 
Pleistocene interglacials owing to the influence of humans. It has there- 
fore been argued that an Anthropocene Epoch is not required, given that 
some human influence is already contained within the definition of the 
Holocene Epoch’. Alternatively, defining the Anthropocene would deprive 
the Holocene Epoch ofits ostensibly unique feature—humans—suggesting 
that the Holocene as an epoch may not be required. 

The views of nineteenth- and twentieth-century scientists illustrate 
the influence of the dominant contemporary concerns on geological 
debates. Today’s scientists may also not be immune to such influences. 
For example, a key concern for scientists and others is the central role of 
technology in modern society and its environmental impacts. Crutzen 
and Stoermer' originally proposed that the start of the Anthropocene 
should be coincident with the beginning of the Industrial Revolution and 
James Watt’s 1784 refinement of the steam engine. Others followed, 
including stratigraphers, suggesting that 1800 should be the beginning 
of the Anthropocene*””’, despite a lack of corresponding global geo- 
logical markers, and the presence of well-known stratigraphic evidence 
suggestive of different dates, such as the radionuclide fallout from mid- 
twentieth-century nuclear weapons tests. Care is needed to ensure that 
the dominant culture of today’s scientists does not subconsciously influ- 
ence the assessment of stratigraphic evidence. 


A human golden spike 


Defining the beginning of the Anthropocene as a formal geologic unit of 
time requires the location of a global marker of an event in stratigraphic 
material, such as rock, sediment, or glacier ice, known as a Global Strato- 
type Section and Point (GSSP), plus other auxiliary stratigraphic markers 
indicating changes to the Earth system. Alternatively, after a survey of the 
stratigraphic evidence, a date can be agreed by committee, known as a 
Global Standard Stratigraphic Age (GSSA). GSSPs, known as ‘golden spikes’, 
are the preferred boundary markers"® (see Box 1). 

Generally, geologists have used temporally distant changes in multiple 
stratigraphic records to delimit major changes in the Earth system and 
thereby geological time units, for example, the appearance of new species 
as fossils within rocks, coupled with other temporally coincident changes. 


PERSPECTIVES | RESEARCH | 


Perhaps the most useful GSSP example when considering a possible 
Anthropocene GSSP is that marking the beginning of the most recent 
epoch, the Holocene”, because some similar choices and difficulties were 
faced. These include: not relying on solid aggregate mineral deposits (‘rock’) 
for the boundary; an event horizon largely lacking fossils (although fossils 
are used to recognize Holocene deposits); the need for very precise GSSP 
dating of events in the recent past; and how to formalize a time unit that 
extends to the present and thereby implicitly includes a view of the future. 

Depending on the parameter considered, the current interglacial took 
decades to millennia to unfold, as global climate, atmospheric chemistry 
and the distribution of plant and animal species all altered. From these 
changes a single dated level within a single stratigraphic record was required 
to be chosen as a GSSP primary marker (Box 1; Fig. 2). Thus, formally, the 
Holocene is marked by an abrupt shift in deuterium (7H) excess values 
at a depth of 1,492.25 m in the NorthGRIP Greenland ice core, dated 
11,650 + 99 yr BP (before present, where ‘present’ is defined to be 1950)**. 
This corresponds to the first signs of predominantly Northern Hemisphere 
climatic warming at the end of the Younger Dryas/ Greenland Stadial 1 
cold period** (Fig. 2). Five further auxiliary stratotypes (four lakes and one 
marine sediment) showing clear correlated changes across the boundary 
complement the GSSP, consistent with the occurrence of global changes 
to the Earth system**. The requirements for a formal definition of the start 
of the Anthropocene are similar: a clear, datable marker documenting a 
global change that is recognizable in the stratigraphic record, coupled 
with auxiliary stratotypes documenting long-term changes to the Earth 
system. 

Defining the Anthropocene presents a further challenge. Changes to 
the Earth system are not instantaneous. However, even spatially hetero- 
geneous and diachronous (producing similar stratigraphic material vary- 
ing in age) changes appear near-instantaneous when viewed millions of 
years after the event, especially as time-lags often fall within the error 
range of the dating techniques. In contrast, Anthropocene deposits are 
commonly dated on decadal or annual scales, so that all changes will appear 
diachronous, to some extent, from today’s perspective (but not from far 
in the future)''*'. Judgement will be required to assess whether the time- 
lags following events and their significant global impacts are too long to 
be of use when defining any Anthropocene GSSP. 

Several approaches have been put forward to define when the 
Anthropocene began, including those focusing on the impact of fire”, 
pre-industrial farming****, sociometabolism*’, and industrial technolo- 
gies'’?°*!”, but the relative merits of the evidence for various starting 
dates have not been systematically assessed against the requirements of a 
golden spike. Below, we review the major events in human history and 
pre-history and their impact on stratigraphic records. We focus on con- 
tinuous stratigraphic material that may yield markers consistent with a 
GSSP (lake and marine sediments, glacier ice) and on the types of chem- 
ical, climatic and biological changes used to denote other epoch bound- 
aries further in the past. We proceed chronologically forward in time, 
presenting the reason why each event was originally proposed, evaluate 
the existence of stratigraphic markers, and assess whether the event pro- 
vides a potential GSSP. The hypotheses and evidence are summarized in 
Table 1. Following the evidence review we briefly consider the relative 
merits of the differing events that probably fulfil the GSSP criteria, and 
assess related GSSA dates. 


Pleistocene human impacts 

The first major impacts of early humans on their environment was 
probably the use of fire. Fossil charcoal captures these events from the 
Early Pleistocene Epoch**“**. However, fires are inherently local events, 
so they do not provide a global GSSP. The next suggested candidate is 
the Megafauna Extinction between 50,000 and 10,000 years ago, given 
that other epoch boundaries have been defined on the basis of extinc- 
tions or on the resultant newly emerging species'®. Overall, during the 
Megafauna Extinction about half of all large-bodied mammals world- 
wide, equivalent to 4% of all mammal species, were lost*’. The losses 
were not evenly distributed: Africa lost 18%, Eurasia lost 36%, North 
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Figure 2 | Defining the beginning of the Anthropocene. a, Current GTS2012 
GSSP boundary between the Pleistocene and Holocene** (dashed line), with 
global temperature anomalies (relative to the early Holocene average over the 
period 11,500 Bp to 6,500 Bp)'’* (blue), and atmospheric carbon dioxide 
composite!’ on the AICC2012 timescale'"* (red). b, Early Anthropogenic 
Hypothesis GSSP suggested boundary (dashed line), which posits that early 
extensive farming impacts caused global environmental changes, defined 
here by the inflection and lowest level of atmospheric methane (in parts per 
billion, p.p.b.) from the GRIP ice core” (green), with global temperature 
anomalies (relative to the average over the period 1961 to 1990)'° (blue), and 
atmospheric carbon dioxide!’ (red). c, Orbis GSSP suggested boundary 
(dashed line), representing the collision of the Old and New World peoples and 
homogenization of once distinct biotas, and defined by the pronounced dip in 
atmospheric carbon dioxide (dashed line) from the Law Dome ice core”””® 
(blue), with global temperature data anomalies (relative to the average over 
the period 1961 to 1990)''* (red). d, Bomb GSSP suggested boundary (dashed 
line), characterized by the peak in atmospheric radiocarbon from annual 
tree-rings (black)’” (the AC value is the relative difference between the 
absolute international standard (base year 1950) and sample activity corrected 
for the time of collection and 830), with atmospheric carbon dioxide from 
Mauna Loa, Hawaii, post-1958''°, and ice core records pre-1958”°”° (red), 
and global temperature anomalies (relative to the average over the period 
1961 to 1990)"6 (blue). 


America lost 72%, South America lost 83%, and Australia lost 88% of 
their large-bodied mammalian genera*®”’. So the Megafauna Extinction 
was actually a series of events on differing continents at differing times and 
therefore lacks the required precision for an Anthropocene GSSP marker. 


Origins and impacts of farming 
The development of agriculture causes long-lasting anthropogenic envir- 
onmental impacts as it replaces natural vegetation, and thereby increases 
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species extinction rates, and alters biogeochemical cycles. Agriculture 
had multiple independent origins: first occurring about 11,000 years 
ago in southwest Asia, South America and north China; between 
6,000-7,000 years ago in Yangtze China and Central America; and 
4,000-5,000 years ago in the savanna regions of Africa, India, southeast 
Asia, and North America”. Thus, the increasing presence of fossil pollen 
from domesticated plants in sediment is too local and lacking in global 
synchrony to form a GSSP marker. Critically, for the Holocene GSSP, 
auxiliary markers within stratigraphic material did not include any human- 
derived markers™, illustrating the lack of anthropogenic impacts at that 
time. Long-lasting cultural evidence related to agriculture is similarly con- 
strained. Although ceramics are datable and preserved in stratigraphic 
records (for example, the mineral mullite“’), they appeared in Africa before 
agriculture, while early southwest Asian farming cultures did not produce 
ceramics. Similarly, anthropogenically formed soils, derived from inten- 
sive farmland management, have also been suggested as a marker of the 
Anthropocene*’. Although these soils are widespread, like vegetation clear- 
ance, they are highly diachronous over about 2,000 yr, thus excluding 
their use as a GSSP marker™. 

A series of Neolithic revolutions resulted in the majority of Homo 
sapiens becoming agriculturalists to some extent by around 8,000 yr Bp, 
rising toa maximum of about 99% by about 500 yr Bp**. The Early Anth- 
ropogenic Hypothesis posits that the current interglacial was similar to 
the previous seven interglacial periods until around 8,000 yr Bp****. By 
comparison with the closest astronomical analogue of the current inter- 
glacial (795,000-780,000 yr BP)*’, atmospheric CO, should have contin- 
ued to decline after 8,000 yr BP, eventually reaching about 240 parts per 
million (p.p.m.), and the onset of glaciation should have begun****. How- 
ever, by 6,000-8,000 yr Bp, farmers’ conversion of high-carbon storage 
vegetation (forest, woodland, woody savanna) to crops and grazing lands, 
plus associated fire impacts, may have increased atmospheric CO, levels, 
and postponed this new glaciation** (Fig. 2). Thus, the lowest level of CO2 
within an ice core record could, in principle, provide a golden spike, but 
the CO, record lacks a distinct inflection point at this time (Fig. 2). Fur- 
thermore, the evidence that human activity was responsible for the gradual 
increase in CO) after 6,000 yr BP is extensively debated*****. 

Methane provides a clearer inflection point, which may provide a pos- 
sible GSSP at 5,020 yr Bp, the date of the lowest methane value recorded in 
the GRIP ice core” (Fig. 2). Archaeological evidence suggests that the 
inflection is caused by rice cultivation in Asia and the expansion of popu- 
lations of domesticated ruminants. Comparisons of changes in atmospheric 
methane from the current and past interglacials”* and some methane 8°C 
value evidence”, also suggest a human cause. However, a model study 
suggests that orbital forcing altering methane emissions from tropical 
wetlands may be responsible*’. Auxiliary markers could include stone 
axes and fossilized domesticated crop pollen and ruminant remains, but 
these do not provide temporally well-correlated markers that collectively 
document globally synchronous changes to the Earth system. 


Collision of the Old and New Worlds 

The arrival of Europeans in the Caribbean in 1492, and subsequent annex- 
ing of the Americas, led to the largest human population replacement in 
the past 13,000 years”, the first global trade networks linking Europe, 
China, Africa and the Americas®, and the resultant mixing of previ- 
ously separate biotas, known as the Colombian Exchange®’*. One bio- 
logical result of the exchange was the globalization of human foodstuffs. 
The New World crops maize/corn, potatoes and the tropical staple manioc/ 
cassava were subsequently grown across Europe, Asia and Africa. Meanwhile, 
Old World crops such as sugarcane and wheat were planted in the New 
World. The cross-continental movement of dozens of other food species 
(such as the common bean, to the New World), domesticated animals (such 
as the horse, cow, goat and pig, all to the Americas) and human commensals 
(the black rat, to the Americas), plus accidental transfers (many species 
of earth worms, to North America; American mink to Europe) contrib- 
uted to a swift, ongoing, radical reorganization of life on Earth without 
geological precedent. 
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Table 1 | Potential start dates for a formal Anthropocene Epoch 
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Event Date Geographical extent 


Primary stratigraphic marker Potential GSSP date* 


Potential auxiliary stratotypes 


Megafauna extinction 50,000-10,000 yr BP Near-global 


Southwest Asia, 
becoming global 


Origin of farming ~11,000 yr ee 


Eurasian event, 
global impact 


Extensive farming ~8,000 yr BP to present 


Southeast Asian 
event, global impact 


Rice production 6,500 yr Bp to present 


Local event, local 
impact, but widespread 


Anthropogenic soils ~3,000-500 yr BP 


ew-Old World 
collision 


1492-1800 Eurasian-Americas 


event, global impact 


ndustrial Revolution 1760 to present Northwest Europe 
event, local impact, 


becoming global 


Local events, 
global impact 


uclear weapon 
detonation 


1945 to present 


Persistent industrial 
chemicals 


~1950 to present Local events, 


global impact 


Fossil megafauna 


Fossil pollen or 
phytoliths 


COz inflection in 
glacier ice 

CH, inflection 

in glacier ice 
Dark high organic 
matter soil 

Low point of CO2 
in glacier ice 


Fly ash from coal 
burning 


Radionuclides (1“C) 
in tree-rings 


For example, SF, peak 
in glacier ice 


None, diachronous 
over ~40,000 yr 


None, diachronous 
over ~5,000 yr 


None, inflection too 


Charcoal in lacustrine deposits 


Fossil crop pollen, phytoliths, 
charcoal 


Fossil crop pollen, phytoliths, 


diffuse charcoal, ceramic minerals 
5,020 yr BP CHy Stone axes, fossil domesticated 
minima ruminant remains 


None, diachronous, 
not well preserved 


1610 COz2 minima 


Fossil crop pollen 


Fossil pollen, phytoliths, charcoal, 
CHa, speleothem 8180, tephrat 


14N:15N ratio and diatom 
composition in lake sediments 


~1900 (ref. 94); 
diachronous over 
~200 yr 


1964 14C peaks 240By: 239Py ratio, compounds 
from cement, plastic, lead and 


other metals 


Peaks often very 
recent so difficult 
to accurately dates 


Compounds from cement, plastic, 
lead and other metals 


For compliance with a Global Stratotype Section and Point (GSSP) definition, a clearly dated global marker is required, backed by correlated auxiliary markers that collectively indicate global and other widespread 


and long-term changes to the Earth system. sp, before present, where present is defined as calendar date 1950. 
* Requires a specific date for a GSSP primary marker. +From Huaynaputina eruption in 1600 (refs 78, 79). 


§ Peak, rather than earliest date of detection selected, because earliest dates reflect available detection technology, are more likely influenced by natural background geochemical levels 


affected by the future decay of the signal, than peak values. 


In terms of stratigraphy, the appearance of New World plant species 
in Old World sediments—and vice versa—may provide a common marker 
of the Anthropocene across many deposits because pollen is often well 
preserved in marine and lake sediments. For example, pollen of New 
World native Zea mays (maize/corn), which preserves very well’, first 
appears in a European marine sediment core in 1600°. The European 
Pollen Database lists a further 70 lake and marine sediment cores con- 
taining Zea mays after this date. Phytoliths can similarly record such 
range expansions”. Specifically, the transcontinental range extension 
of at least one Old World species into the New World (banana, as phy- 
toliths in Central and tropical South America sediments) and a second 
species from the New World expanding into the Old World (maize/corn, 
as pollen preserved in sediments in Eurasia and Africa) together consti- 
tute a unique signature in the stratigraphic record. This transcontinental 
range expansion—stratigraphically marking before and after an event— 
is comparable to the use of the appearance of new species as boundary 
markers in other epoch transitions”. 

Besides permanently and dramatically altering the diet of almost all of 
humanity, the arrival of Europeans in the Americas also led to a large 
decline in human numbers. Regional population estimates sum to a total 
of 54 million people in the Americas in 1492°, with recent population 
modelling estimates of 61 million people**. Numbers rapidly declined to 
a minimum of about 6 million people by 1650 via exposure to diseases 
carried by Europeans, plus war, enslavement and famine***. The accom- 
panying near-cessation of farming and reduction in fire use resulted in 
the regeneration of over 50 million hectares of forest, woody savanna 
and grassland with a carbon uptake by vegetation and soils estimated 
at 5-40 Pg within around 100 years**’°”’. The approximate magnitude 
and timing of carbon sequestration suggest that this event significantly 
contributed to the observed decline in atmospheric CO, of 7-10 p.p.m. 
(1 p.p.m. CO, = 2.1 Pg of carbon) between 1570 and 1620 documented in 
two high-resolution Antarctic ice core records’*-”° (Fig. 2 and Box 2). This 
dip in atmospheric CO, is the most prominent feature, in terms of both 
rate of change and magnitude, in pre-industrial atmospheric CO, records 
over the past 2,000 years”° (Fig. 2). 

On the basis of the movement of species, atmospheric CO, decline 
and the resulting climate-related changes within various stratigraphic 
records, we propose that the 7-10 p.p.m. dip in atmospheric CO, to a 


101 and will be more 


low point of 271.8 p.p.m. at 285.2 m depth of the Law Dome ice core”, 
dated 1610 (+15 yr; refs 75, 76), is an appropriate GSSP marker (Fig. 2). 
Auxiliary stratotypes could include: the first occurrence of a cross-ocean 
range extension in the fossil record (Zea mays, in 1600) plus a range of 
deposits showing distinct changes at that time, including tephra’”’* and 
other signatures from the 1600 Huaynaputina eruption detected at both 
poles and in the tropics’””; charcoal reductions in deposits in the 
Americas’! and globally*°; decreases in atmospheric methane, enrich- 
ment of methane 8!°C, and decreases in carbon monoxide in Antarctic 
ice cores*'*; pollen in lacustrine sediments showing vegetation regen- 
eration®; proxies indicating anomalous Arctic sea-ice extent®®; changing 
5'80 derived from speleothems from caves in China and Peru" and other 
studies noting changes coincident with 1600 and the coolest part of the 
Little Ice Age (1594-1677; ref. 87), a relatively synchronous global event 
noted in geologic deposits worldwide*’. 

The impacts of the meeting of Old and New World human populations— 
including the geologically unprecedented homogenization of Earth’s 
biota®*°!—may serve to mark the beginning of the Anthropocene. Although 
it represents a major event in world history ***, the collision of the Old 
and New Worlds has not been proposed previously, to our knowledge, 
as a possible GSSP. We suggest naming the dip in atmospheric CO, the 
‘Orbis spike’ and the suite of changes marking 1610 as the beginning 
of the Anthropocene the ‘Orbis hypothesis’, from the Latin for world, 
because post-1492 humans on the two hemispheres were connected, trade 
became global, and some prominent social scientists refer to this time as 


the beginning of the modern ‘world-system’””. 


Industrialization 

The beginning of the Industrial Revolution has often been suggested as 
the beginning of the Anthropocene, because accelerating fossil fuel use 
and coupled rapid societal changes herald something important and 
unique in human history'**. Yet humans have long been engaging in 
industrial-type production, such as metal utilization from around 8,000 yr BP 
onwards, with attendant pollution”. Elevated mercury records are docu- 
mented at around 3,400 yr BP in the Peruvian Andes”', while the impacts 
of Roman Empire copper smelting are detectable in a Greenland ice core at 
around 2,000 yr Bp”’. This metal pollution, like other examples predating the 
Industrial Revolution, is too local and diachronous to provide a golden spike. 
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BOX 2 
Origins of the 1610 decrease in 
atmospheric CO, 


Is the COz decline real? 

Two independent high-resolution Antarctic ice core records from 
the Law Dome and the Western Antarctic Ice Sheet show a reduction in 
atmospheric COz of 7-10 p.p.m. between 1570 and 16207” (Fig. 2). 
Asmaller COz decrease is also observed in less highly resolved 
Antarctic cores!!”118, The decline exceeds the measurement error of 
the cores, 1-2 p.p.m., and experiments suggest that it does not result 
from in situ changes within the ice core??°, 


Did human activity cause the decline? 

The arrival of Europeans in the Americas led to a catastrophic 
decline in human numbers, with about 50 million deaths between 
1492 and 1650, according to several independent sources®?3659, 
Contemporary field observations of soil’?° and vegetation!? carbon 
dynamics following agriculture abandonment suggest that about 
65 million hectares (that is, 50 million people xX 1.3 hectares per 
person) would sequester 7-14 Pg of carbon over 100 years (that is, 
100-200 Mg of carbon per hectare total uptake, above- and 
below-ground). Reduction in fire use for land management would 
additionally increase carbon uptake outside farmed areas. 

Studies using a variety of methods report broadly consistent 
estimates®®’°’2 of carbon uptake by vegetation of 5-40 Pg (2.1 Pg of 
carbon = 1 p.p.m. atmospheric COz over shorter timescales, lessening 
over time?*”). Given that maximum human mortality rates were not 
reached for some decades after 149262°°, and maximum carbon 
uptake would take place 20-50 yr after farming abandonment, peak 
carbon sequestration would occur approximately between 1550 
and 1650. 

Some model studies spanning thousands of years find a net land 
surface carbon uptake spanning 1500-1650 across the Americas®®, 
while others do not!?2. However, in general, evidence from such studies 
weakly constrain the problem because Holocene carbon cycle 
modelling is designed to investigate changes associated with long- 
acting slow processes (carbon uptake by peat or coral reefs) and 
feedback mechanisms (oceanic outgassing, oceanic uptake and CO2 
fertilization of vegetation), and probably poorly represent the short 
period of the COz dip (for example, ref. 57). For example, a study 
calculating a net zero impact of the cessation of farming in the 
Americas??? included a large soil carbon flux to the atmosphere, which 
contradicts field evidence!?°2°, and had the effect of offsetting the 
uptake from growing trees!22. Carbon cycle models with robust 
representations of land-use change and subsequent vegetation 
regeneration following the Americas population catastrophe will be 
required to improve estimates of carbon uptake compared with 
carbon accounting studies. 

The approximate magnitude and timing of carbon sequestration 
make the population decline in the Americas the most likely cause of 
the observed decline in atmospheric CO2. Atmospheric’*!2*125 and 
tropical marine 5'3C analyses?”° also support uptake of CO» by 
vegetation rather than oceanic uptake. The 1600 Huaynaputina 
eruption in Peru’®’? probably exacerbated the CO. minima, and a 
lagged oceanic outgassing in response to the land carbon uptake 
probably contributed to the fast rebound of atmospheric COz after 
1610?27. In addition, multi-proxy reconstructions of temperature 
indicate that, after accounting for both solar and volcanic radiative 
forcing, additional terrestrial carbon uptake is required to explain 
temperature declines over the 1550-1650 period?°”. This is 
consistent with uptake by vegetation following the population crash in 
the Americas!©”. 


176 | NATURE | VOL 519 | 12 MARCH 2015 


Definitions of the Industrial Revolution give an onset date anywhere 
between 1760 and 1880, beginning as an event local to northwest Europe™. 
Given the initial slow spread of coal use, ice core records show little impact 
on global atmospheric CO, concentration until the nineteenth century, 
and then they show a relatively smooth increase rather than an abrupt 
change, precluding this as a GSSP marker (Fig. 2). Similarly, other assoc- 
iated changes, including methane and nitrate’, products of fossil fuel 
burning (including spherical carbonaceous particles” and magnetic fly 
ash”*) plus resultant changes in lake sediments”*”® alter slowly as the use 
of fossil fuels increased over many decades. Lead, which was once routinely 
added to vehicle fuels, has been proposed as a possible marker, because 
leaded fuel was almost globally used and is now banned”’. However, peak 
lead isotope ratio values from this source in sediments and other deposits 
vary from 1940 to after 1980, limiting the utility of this marker. The Indus- 
trial Revolution thus provides a number of markers spreading from north- 
west Europe to North America and expanding worldwide since about 
1800, although none provides a clear global GSSP primary marker. 


The Great Acceleration 

Since the 1950s the influence of human activity on the Earth system has 
increased markedly. This “Great Acceleration’ is marked by a major 
expansion in human population, large changes in natural processes*!*”*, 
and the development of novel materials from minerals to plastics to per- 
sistent organic pollutants and inorganic compounds*"*””’. Among these 
many changes the global fallout from nuclear bomb tests has been pro- 
posed as a global event horizon marker*'”. The first detonation was in 
1945, with a peak in atmospheric testing from the late 1950s to early 
1960s, followed by a rapid decline following the Partial Test Ban Treaty 
in 1963 and later agreements, such that only low test levels continue 
to the present day (Fig. 2). A resulting distinct peak in radioactivity is 
recorded in high-resolution ice cores, lake and salt marsh sediments, 
corals, speleothems and tree-rings from the early 1950s onwards, declin- 
ing in the late 1960s!5°°. The clearest signal is from atmospheric '4C, 
seen in direct air measurements and captured by tree-rings and glacier 
ice, which reaches a maximum in the mid- to high-latitude Northern 
Hemisphere at 1963-64 and a year later in the tropics!”. Although '*C 
has a relatively short half-life (5,730 years), elevated levels will persist long 
enough to be useable for several generations of geologists in the future. 

While recognizing that many apparently novel industrially produced 
chemicals are occasionally produced in small quantities naturally’, 
chemical signatures from long-lived well-mixed gases in glacier ice or 
sediments may also meet GSSP criteria. Potential long-lived gases are 
the halogenated gases, such as SF., CF, CF4 (with half-lives of 3,000 yr, 
10,000 yr and 50,000 yr, respectively). Most were first manufactured indus- 
trially in the 1950s, and many are measurable in firn air’, and with large 
enough samples could be measured in ice cores’. But although they are 
measurable, distinct peaks are very recent and sometimes absent because 
major declines in industrial production are occurring after the nego- 
tiation and ratification of the 1989 Montreal and 2005 Kyoto protocols. 

Of the various possible mid- to late-twentieth-century markers of the 
Great Acceleration, the global '*C peak provides an unambiguously glo- 
bal change in a number of stratigraphic deposits. We suggest that an un- 
equivocally annual record is the optimal choice to reflect the '*C peak, 
thereby giving a dating accuracy of one year. We propose that the GSSP 
marker should be the /4C peak, at 1964, within dated annual rings of a 
pine tree (Pinus sylvestris) from King Castle, Niepotomice, 25 km east of 
Krakow, Poland’® (Fig. 2). Secondary correlated markers would include 
plutonium isotope ratios (“°Pu/*??Pu) in sediments indicating bomb 
testing'™’, (fast-decaying) 137-Caesium””’, alongside the presence of peaks 
in very long-lived iodine isotopes ('7°I, with half-life 15.7 million years) 
found in marine sediments’® and soils’”®. 

While radionuclide fallout did not have major biological or other wide- 
spread physical repercussions, other auxiliary stratotypes may include 
the numerous other human-driven changes resulting in mid- to late- 
twentieth-century changes in geological deposits, including fossil pollen of 
novel genetically modified crops; declines in 5'°N in Northern Hemisphere 
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lakes”® and ice cores’; the emergence of SF, and CF, from background 
levels'*; lead isotopes in ice cores’*; microplastics in marine sediments”; 
diatom assemblages in lakes in response to eutrophication*'; and benthic 
foraminifera changes in marine sediments". 


Dating the Anthropocene 


We conclude that most proposed Anthropocene start dates, including 
the earliest detectable human impacts”, earliest widespread impacts”, 
and historic events such as the Industrial Revolution’ **?”, can probably 
be rejected because they are not derived from a globally synchronous 
marker. Our review highlights that only those environmental changes 
associated with well-mixed atmospheric gases provide clearly global syn- 
chronous geological markers on an annual or decadal scale, as is required 
to define a GSSP for the Anthropocene. The earliest potential GSSP 
primary marker we identify is the inflection of atmospheric methane at 
5,020 yr Bp (Fig. 2; Table 1), but correlated auxiliary stratotypes are lack- 
ing. Thus, the CH, inflection is unlikely to be a strong candidate for the 
beginning of the Anthropocene. We find that only two other events—the 
Orbis spike dip in CO, with a minimum at 1610, and the bomb spike 1964 
peak in '*C—appear to fulfil the criteria for a GSSP to define the inception 
of the Anthropocene (Fig. 2; Table 1). While both GSSP dates have a number 
of correlated auxiliary stratotypes there are advantages and disadvantages 
associated with each. 

The main advantage to the 1610 Orbis spike is the geological and histo- 
rical importance of the event. In common with other epoch boundaries”® 
this boundary would document changes in climate*”"”’, chemistry’”* and 
palaeontological®** signals. Critically, the transoceanic movement of 
species is an unambiguously permanent change to the Earth system”, 
and such a boundary would mark Earth’s last globally synchronous cool 
period®’ before the long-term global warmth of the Anthropocene Epoch. 
Historically, the Industrial Revolution has often been considered as the 
most important event in relation to the inception of the Anthropocene’**, 
but we have not identified a clear global Industrial Revolution GSSP. 
However, in the view of many historians, industrialization and extensive 
fossil fuel use were only made possible by the annexing of the Americas™. 
Before the Industrial Revolution both northwest Europe and southern 
China were similar in terms of life expectancy and material consumption 
patterns, including modest coal use, and both regions faced productive 
boundaries based on the available land area**. Thus, the agricultural com- 
modities from the vast new lands of the Americas allowed Europe to tran- 
scend its ecological limits and sustain economic growth. In turn, this freed 
labour, allowing Europe to industrialize. That is, the Americas made in- 
dustrialization possible owing to the unprecedented inflow of new cheap 
resources (and profitable new markets for manufactured goods). This ‘Great 
Divergence’ of Europe from the rest of the world required access to and 
exploitation of new lands plus a rich source of easily exploitable energy: 
coal**. Thus, dating the Anthropocene to start about 150 years before the 
beginning of the Industrial Revolution is consistent with a contemporary 
understanding of the likely material causes of the Industrial Revolution. 
The main disadvantage to the Orbis hypothesis is that a number of depos- 
its may not show large changes around 1600, particularly in terms of bio- 
logical material from the transport of species to new continents or oceans, 
because there are time-lags before species newly appear in geological 
deposits. 

The key advantage of selecting 1964 as the base of a new Anthropocene 
Epoch is the sheer variety of human impacts recorded during the Great 
Acceleration: almost all stratigraphic records today, and over recent dec- 
ades, have some marker of human activity. The latter part of the twentieth 
century is unambiguously a time of major anthropogenic global envir- 
onmental impacts’. One disadvantage is that although nuclear explosions 
have the capacity to fundamentally transform many aspects of Earth’s 
functioning, so far they have not done so, making the radionuclide spike 
a good GSSP marker but not an Earth-changing event. A further possible 
limitation in selecting such a recent date is that some deposits, notably 
some marine sediments, do not accumulate and stabilize over time spans 
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as short as the past 50 years, making clear datable changes and correla- 
tion among some stratotypes sometimes difficult to discern”. 

Choosing between the 1610 Orbis and 1964 bomb spikes is challen- 
ging. As an alternative, a GSSA date, based on stratigraphic evidence, could 
be agreed upon by committee as the inception of the Anthropocene. 
However, any chosen date would be potentially open to challenge as 
arbitrary. For example, the Industrial Revolution is certainly a pivotal 
moment in human history, yet it is unclear how one could choose, based 
on the available geological evidence, an early Industrial Revolution GSSA 
date, say 1800, over a later date, perhaps 1850 or 1900. Similarly, the Great 
Acceleration is diachronous'”, and GSSA suggested dates could be 1945, 
1950 or 1954 (ref. 109). Given such difficulties, given that GSSP markers 
are preferred”°, and given that candidate GSSP markers exist, a GSSA date 
seems unnecessary. Of the GSSP possibilities we tend to prefer 1610, be- 
cause the transoceanic movement of species is a clear and permanent geo- 
logical change to the Earth system. This date also fits more closely with 
Crutzen and Stoermer’s original proposal’ of an important historical junc- 
ture—the Industrial Revolution—as the beginning of the Anthropocene, 
which has been enduringly popular and useful, suggesting 1610 may be 
similarly so. 

We hope that identifying a limited number of possible events and GSSP 
markers may assist in focusing research efforts to select a robust GSSP 
alongside a series of auxiliary stratotypes. Such research might include 
compiling data sets of the first appearance of non-native species in lake 
and marine sediments to better document the transoceanic spread of 
species and improve the evidence on which the 1610 proposal is based. 
The reliable detection of '*’I in high-resolution glacier ice and expanding 
the number of locations at which novel minerals, compounds and other 
recent human signals are investigated*"” would advance the 1964 GSSP 
proposal. 

Ratification of an Anthropocene Epoch would require a further deci- 
sion to be made, that is, whether to retain the Holocene Epoch (Fig. 1). 
All Anthropocene GSSP choices would leave a complete Holocene Epoch 
at least three orders of magnitude shorter than any other epoch’? and 
similar to previous Pleistocene interglacials’’, which are not epoch-level 
events. Furthermore, the existence of a Holocene Epoch is due, in part, 
to the view—originating from nineteenth-century geologists—that the 
presence or influence of humans distinguished the Holocene from the 
Pleistocene””®*”?>**, An Anthropocene Epoch, combined with today’s 
evidence that Homo sapiens is a Pleistocene species, removes key justifica- 
tions for retaining the Holocene as an epoch-level designation. We there- 
fore suggest that if the Anthropocene is accepted as an epoch it should 
directly follow the Pleistocene (Fig. 1c), as suggested independently else- 
where’. If the Holocene ceases to be an epoch but refers instead to the 
final stage of the Pleistocene Epoch, we suggest that the term Holocenian 
Stage is used, to maintain consistency with current terminology. While an 
alternative informal geological term, the Flandrian stage, denotes the cur- 
rent interglacial as part of the Pleistocene, its use has strongly declined 
over recent decades", and would not be as recognizable as the Holocenian 
Stage. Re-classifying any pre-Anthropocene Epoch interglacial time unit 
as the Holocenian Stage will create the usual tension'® between resistance 
to altering past GTS agreements and the maintenance of GTS internal 
consistency. 


The wider importance 


The choice of either 1610 or 1964 as the beginning of the Anthropocene 
would probably affect the perception of human actions on the envir- 
onment. The Orbis spike implies that colonialism, global trade and coal 
brought about the Anthropocene. Broadly, this highlights social con- 
cerns, particularly the unequal power relationships between different 
groups of people, economic growth, the impacts of globalized trade, and 
our current reliance on fossil fuels. The onward effects of the arrival of 
Europeans in the Americas also highlights a long-term and large-scale 
example of human actions unleashing processes that are difficult to pre- 
dict or manage. Choosing the bomb spike tells a story of an elite-driven 
technological development that threatens planet-wide destruction. The 
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long-term advancement of technology deployed to kill people, from spears 
to nuclear weapons, highlights the more general problem of ‘progress 
traps’. Conversely, the 1963 Partial Test Ban Treaty and later agree- 
ments highlight the ability of people to collectively successfully manage a 
major global threat to humans and the environment. The event or date 
chosen as the inception of the Anthropocene will affect the stories people 
construct about the ongoing development of human societies. 

Past scientific discoveries have tended to shift perceptions away from 
a view of humanity as occupying the centre of the Universe. In 1543 
Copernicus’s observation of the Earth revolving around the Sun demon- 
strated that this is not the case. The implications of Darwin’s 1859 
discoveries then established that Homo sapiens is simply part of the tree 
of life with no special origin. Adopting the Anthropocene may reverse 
this trend by asserting that humans are not passive observers of Earth’s 
functioning. To a large extent the future of the only place where life is 
known to exist is being determined by the actions of humans. Yet, the 
power that humans wield is unlike any other force of nature, because it is 
reflexive and therefore can be used, withdrawn or modified. More wide- 
spread recognition that human actions are driving far-reaching changes 
to the life-supporting infrastructure of Earth may well have increasing 
philosophical, social, economic and political implications over the com- 
ing decades. 


Received 26 March 2014; accepted 12 January 2015. 


de Crutzen, P. J. & Stoermer, E. F. The Anthropocene. /GBP Global Change Newsl. 41, 
17-18 (2000). 
This paper suggested that the Holocene has ended and the Anthropocene has 
begun, starting the contemporary increase in the usage of the term 
Anthropocene. 

2. Crutzen, P. J. Geology of mankind. Nature 415, 23 (2002). 

3: Steffen, W., Crutzen, P. J. & McNeill, J. R. The Anthropocene: are humans now 
overwhelming the great forces of nature. Ambio 36, 614-621 (2007). 

4. Zalasiewicz, J., Williams, M., Haywood, A. & Ellis, M. The Anthropocene: a new 
epoch of geological time? Phil. Trans. R. Soc. Lond. A 369, 835-841 (2011). 

5. Dalby, S. Biopolitics and climate security in the Anthropocene. Geoforum 49, 
184-192 (2013). 

6. Anon. The Anthropocene: a man-made world. The Economist May 26 (2011); 

http://www.economist.com/node/18741749. 
qi Zalasiewicz, J. The Earth After Us: What Legacy Will Humans Leave in the Rocks? 
(Oxford University Press, 2008). 

8. Autin, W. J. & Holbrook, J. M. Is the Anthropocene an issue of stratigraphy or pop 
culture? GSA Today 22, 60-61 (2012). 

9. Gibbard, P. L. & Walker, M. J.C. The term ‘Anthropocene’ in the context of formal 
geological classification. Geol. Soc. Lond. Spec. Publ. 395, 29-37 (2014). 
This paper presents a view that there is not currently enough evidence to 
formally ratify a new Anthropocene Epoch. 

10.  Gradstein, F. M., Ogg, J. G., Schmitz, M. D. & Ogg, G. M. The Geologic Time Scale 

2012 (Elsevier, 2012). 
This book is the latest GTS, including the formal assessments of Earth’s history 
divided into epochs, periods, eras and eons. 

11. Finney, S. C. The ‘Anthropocene’ as a ratified unit in the ICS International 
Chronostratigraphic Chart: fundamental issues that must be addressed by the 
Task Group. Geol. Soc. Lond. Spec. Publ. 395, 23-28 (2014). 

This paper details the requirements and questions that will need to be 
addressed by the initial committee that will recommend whether or not an 
Anthropocene epoch is to be formally defined. 
2. Canfield, D. E., Glazer, A. N. & Falkowski, P. G. The evolution and future of Earth’s 
nitrogen cycle. Science 330, 192-196 (2010). 
3. Ciais, P. et al. in Climate Change 2013: The Physical Science Basis. Contribution of 
Working Group | to the Fifth Assessment Report of the Intergovernmental Panel on 
Climate Change (eds Stocker, T. F. et al.) Ch. 6, 465-570 (Cambridge Univ. Press, 
2013). 
4. Masson-Delmotte, V. et a/. in Climate Change 2013: The Physical Science Basis. 
Contribution of Working Group | to the Fifth Assessment Report of the 
Intergovernmental Panel on Climate Change (eds Stocker, T. F. et al.) Ch. 5, 
383-464 (Cambridge Univ. Press, 2013). 
5. Wolff, E. W. lce Sheets and the Anthropocene. Geol. Soc. Lond. Spec. Publ. 395, 
255-263 (2014). 
6. International Geosphere-Biosphere Programme, Intergovernmental 
Oceanographic Commission, Scientific Committee on Oceanic Research. Ocean 
Acidification Summary for Policymakers — Third Symposium on the Ocean in a High- 
CO>2 World (International Geosphere-Biosphere Programme, 2013), http:// 
ocean-acidification.net/for-policymakers/. 
17. Running, S.W.A measurable planetary boundary for the biosphere. Science 337, 
1458-1459 (2012). 

18. Krausmann, F. et a/. Global human appropriation of net primary production 
doubled in the 20th century. Proc. Nat! Acad. Sci. USA 110, 10324-10329 
(2013). 


178 | NATURE | VOL 519 | 12 MARCH 2015 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 
28. 
29. 
30. 
31. 
32. 
33. 
34. 
35. 
36. 
37. 
38. 
39. 
40. 


41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


51. 


52. 


53. 


54. 


Barnosky, A. D. etal. Has the Earth’s sixth mass extinction already arrived? Nature 
471, 51-57 (2011). 

Thomas, C. D. The Anthropocene could raise biological diversity. Nature 502, 7 
(2013). 

Baiser, B., Olden, J. D., Record, S., Lockwood, J. L. & McKinney, M. L. Pattern and 
process of biotic homogenization in the New Pangaea. Proc. R. Soc. Lond. B 279, 
4772-4777 (2012). 

Palumbi, S. R. Humans as the world’s greatest evolutionary force. Science 293, 
1786-1790 (2001). 

Darimont, C. T. eta/. Human predators outpace other agents of trait change in the 
wild. Proc. Natl Acad. Sci. USA 106, 952-954 (2009). 

Tabashnik, B. E., Mota-Sanchez, D., Whalon, M. E., Hollingworth, R. M. & Carriere, 
Y. Defining terms for proactive management of resistance to Bt crops and 
pesticides. J. Econ. Entomol. 107, 496-507 (2014). 

Stuart, Y. E. et al. Rapid evolution of a native species following invasion by a 
congener. Science 346, 463-466 (2014). 

Davis, R. V. Inventing the present: historical roots of the Anthropocene. Earth Sci. 
Hist. 30, 63-84 (2011). 

This paper investigates and reviews the history of the use of the terms 
‘Holocene’ and ‘Anthropocene’, showing that the Holocene includes humans 
in its first nineteenth-century definition. 

Rudwick, M. S. J. Bursting the Limits of Time: The Reconstruction of Geohistory in 
the Age of Revolution (University of Chicago Press, 2005). 

Jenkyn, T. W. Lessons in Geology XLVI. Chapter IV. On the effects of organic 
agents on the Earth’s crust. Popular Educator 4, 139-141 (1854). 

Jenkyn, T. W. Lessons in Geology XLIX. Chapter V. On the classification of rocks 
section IV. On the tertiaries Popular Educator 4, 312-316 (1854). 

Hansen, P. H. The Summits of Modern Man: Mountaineering after the 
Enlightenment (Harvard University Press, 2013). 

Haughton, S. Manual of Geology (Longman, 1865). 

Stoppani, A. Corso di Geologia Vol. Il (G. Bernardoni e G. Brigola, 1873). 

Dana, J. D. Manual of Geology (Theodore Bliss and Co., 1863). 

Le Conte, J. On critical periods in the history of the Earth and their relation to 
evolution; and on the Quaternary as such a period. Am. J. Sci. 14, 99-114 (1877). 
Lyell, C. Principles of Geology Volumes |, Il and Ill (University of Chicago Press, 
1990); originally published by John Murray, 1830-1833. 

Shantser, E. V. in Great Soviet Encyclopedia Vol. 2 (ed. Prokhorov, A. M.) 139-144 
(Macmillan, 1979). 

Vernadsky, W. |. Biosphere and Noosphere. Am. Sci. 33, 1-12 (1945). 

Walker, M. et al. Formal definition and dating of the GSSP (Global Stratotype 
Section and Point) for the base of the Holocene using the Greenland NGRIP ice 
core, and selected auxiliary records. J. Quat. Sci. 24, 3-17 (2009). 

Steffen, W., Grinevald, J., Crutzen, P. & McNeill, J. The Anthropocene: conceptual 
and historical perspectives. Phil. Trans. R. Soc. Lond. A 369, 842-867 (2011). 
Zalasiewicz, J. et al. Stratigraphy of the Anthropocene. Phil. Trans. R. Soc. Lond. A 
369, 1036-1055 (2011). 

Waters, C. N., Zalasiewicz, J.A., Williams, M., Ellis, M. A. & Snelling, A. M.A 
stratigraphical basis for the Anthropocene? Geol. Soc. Lond. Spec. Publ. 395, 
1-21 (2014). 

This paper reviews various stratigraphic markers relevant to defining the 
Anthropocene, with an up-to-date collation of the many markers coincident 
with the Industrial Revolution and the Great Acceleration. 

Glikson, A. Fire and human evolution: the deep-time blueprints of the 
Anthropocene. Anthropocene 3, 89-92 (2013). 

Ruddiman, W. F. The Anthropocene. Annu. Rev. Earth Planet. Sci. 41, 45-68 
(2013). 

This paper summarizes the data and arguments that human activity altered 
CO, and CH, emissions thousands of years ago, leading to a delayed next 
glaciation, known as the Early Anthropogenic Hypothesis. 

Foley, S. F. et al. The Palaeoanthropocene—the beginnings of anthropogenic 
environmental change. Anthropocene 3, 83-88 (2013). 

Balter, M. Archaeologists say the ‘Anthropocene’ is here—but it began long ago. 
Science 340, 261-262 (2013). 

Fischer-Kowalski, M., Krausmann, F. & Pallua, |. A sociometabolic reading of the 
Anthropocene: modes of subsistence, population size and human impact on 
Earth. Anthropocene Rev. 1, 8-33 (2014). 

This paper takes an alternative view of the Anthropocene, considering human 
energy sources, and posits two transitions, to an agricultural mode, about 
10,000 yr sp, and to an industrial mode, which begins after 1500. 
Zalasiewicz, J., Williams, M. & Waters, C. N. Can an Anthropocene series be 
defined and recognized? Geol. Soc. Lond. Spec. Publ. 395, 39-53 (2014). 
Roebroeks, W. & Villa, P. On the earliest evidence for habitual use of fire in Europe. 
Proc. Natl Acad. Sci. USA 108, 5209-5214 (2011). 

Barnosky, A. D. Palaeontological evidence for defining the Anthropocene. Geol. 
Soc. Lond. Spec. Publ. 395, 149-165 (2014). 

Barnosky, A. D., Koch, P. L., Feranec, R.S., Wing, S.L. & Shabel, A. B. Assessing the 
causes of Late Pleistocene extinctions on the continents. Science 306, 70-75 
(2004). 
Lorenzen, E. D. etal. Species-specific responses of Late Quaternary megafauna to 
climate and humans. Nature 479, 359-364 (2011). 

Ellis, E. C. et a/. Used planet: a global history. Proc. Natl Acad. Sci. USA 110, 
7978-7985 (2013). 
Certini, G. & Scalenghe, R. Anthropogenic soils are the golden spikes for the 
Anthropocene. Holocene 21, 1269-1274 (2011). 

Gale, S. J. & Hoare, P. G. The stratigraphic status of the Anthropocene. Holocene 
22, 1491-1494 (2012). 


©2015 Macmillan Publishers Limited. All rights reserved 


55. 


56. 
57. 


58. 


59. 


60. 


61. 


62. 


63. 


64. 
65. 


66. 


67. 


68. 
69. 


70. 


71. 


72. 


73. 


74. 


75. 
76. 


77. 


78. 


79. 


80. 
81. 


82. 


83. 
84. 


85. 
86. 


87. 


Tzedakis, P. C., Channell, J. E. T., Hodell, D. A., Kleiven, H. F. & Skinner, L. C. 
Determining the natural length of the current interglacial. Nature Geosci. 5, 
138-141 (2012). 

Broecker, W. C. & Stocker, T. F. The Holocene COz rise: Anthropogenic or natural? 
Eos 87, 27-29 (2006). 

Stocker, B. D., Strassmann, K. & Joos, F. Sensitivity of Holocene atmospheric CO» 
and the modern carbon budget to early human land use: analyses with a 
process-based model. Biogeosciences 8, 69-88 (2011). 

Kaplan, J. O. et al. Holocene carbon emissions as a result of anthropogenic land 
cover change. Holocene 21, 775-791 (2011). 

Blunier, T., Chappellaz, J., Schwander, J., Stauffer, B. & Raynaud, D. Variations in 
atmospheric methane concentration during the Holocene epoch. Nature 374, 
46-49 (1995). 

Sapart, C. J. et al. Natural and anthropogenic variations in methane sources 
during the past two millennia. Nature 490, 85-88 (2012). 

Singarayer, J. S., Valdes, P. J., Friedlingstein, P., Nelson, S. & Beerling, D. J. Late 
Holocene methane rise caused by orbitally controlled increase in tropical 
sources. Nature 470, 82-85 (2011). 

Diamond, J. Guns, Germs and Steel: A Short History of Everybody for the Last 13,000 
Years (Chatto and Windus, 1997). 

ann, C. C. 1493: How the Ecological Collision of Europe and the Americas Gave 
Rise to the Modern World (Granta, 2011). 

Crosby, A. W. The Columbian Exchange: Biological and Cultural Consequences of 
1492 30 yr edn (Preager, 2003). 

ercuri, A. M. et al. A marine/terrestrial integration for mid-late Holocene 
vegetation history and the development of the cultural landscape in the Po valley 
as a result of human impact and climate change. Vegetat. Hist. Archaeobot. 21, 
353-372 (2012). 

Piperno, D. R. Identifying crop plants with phytoliths (and starch grains) in 
Central and South America: a review and an update of the evidence. Quat. Int. 
193, 146-159 (2009). 

Zalasiewicz, J. & Williams, M. The Anthropocene: a comparison with the 
Ordovician-Silurian boundary. Rendiconti Lincei-Scienze Fisiche E Naturali 25, 
5-12 (2014). 

Denevan, W. M. The Native Population of the Americas in 1492 2nd edn (University 
of Wisconsin Press, 1992). 

Mann, C. C. 1491: New Revelations of the Americas Before Columbus (Vintage, 
2005). 

Nevle, R. J. & Bird, D. K. Effects of syn-pandemic fire reduction and reforestation 
in the tropical Americas on atmospheric CO» during European conquest. 
Palaeogeogr. Palaeoclimatol. Palaeoecol. 264, 25-38 (2008). 

This paper presents a synthesis of data computing the impacts of the rapid 
1492-1650 reduction in population across the Americas and the carbon 
uptake implications. 

Dull, R. A. et al. The Columbian encounter and the Little Ice Age: abrupt land use 
change, fire, and greenhouse forcing. Ann. Assoc. Am. Geogr. 100, 755-771 
(2010). 

Nevle, R. J., Bird, D. K., Ruddiman, W. F. & Dull, R. A. Neotropical human- 
landscape interactions, fire, and atmospheric CO2 during European conquest. 
Holocene 21, 853-864 (2011). 

Ahn, J. et a/. Atmospheric COz over the last 1000 years: a high-resolution record 
from the West Antarctic Ice Sheet (WAIS) divide ice core. Glob. Biogeochem. 
Cycles 26, GB2027 (2012). 

Rubino, M. et al. A revised 1000 year atmospheric delta C-13-COz record from 
Law Dome and South Pole, Antarctica. J. Geophys. Res. D 118, 8482-8499 
(2013). 

MacFarling Meure, C. et al. Law Dome COz, CH, and N20 ice core records 
extended to 2000 years BP. Geophys. Res. Lett. 33, L14810 (2006). 

Etheridge, D. M., Steele, L. P., Francey, R. J. & Langenfelds, R. L. Atmospheric 
methane between 1000 AD and present: evidence of anthropogenic emissions 
and climatic variability. J. Geophys. Res. D 103, 15979-15993 (1998). 

Smith, V.C. Volcanic markers for dating the onset of the Anthropocene. Geol. Soc. 
Lond. Spec. Publ. 395, 283-299 (2014). 

de Silva, S. L. & Zielinski, G. A. Global influence of the AD1600 eruption of 
Huaynaputina, Peru. Nature 393, 455-458 (1998). 

Thompson, L. G. et a/. Annually resolved ice core records of tropical climate 
variability over the past ~1800 Years. Science 340, 945-950 (2013). 

Power, M.J. etal. Climatic control of the biomass-burning decline in the Americas 
after AD 1500. Holocene 23, 3-13 (2013). 

Wang, Z., Chappellaz, J., Park, K. & Mak, J. E. Large variations in Southern 
Hemisphere biomass burning during the last 650 years. Science 330, 
1663-1666 (2010). 
Ferretti, D. F. et a/. Unexpected changes to the global methane budget over the 
past 2000 years. Science 309, 1714-1717 (2005). 

Mischler, J. A. et al. Carbon and hydrogen isotopic composition of methane over 
the last 1000 years. Glob. Biogeochem. Cycles 23, GB4024 (2009). 

Mitchell, L. E., Brook, E. J., Sowers, T., McConnell, J. R. & Taylor, K. Multidecadal 
variability of atmospheric methane, 1000-1800 CE. J. Geophys. Res. 116, 
G02007 (2011). 

Bush, M. B. & Colinvaux, P. A. Tropical forest disturbance: Paleoecological 
records from Darien, Panama. Ecology 75, 1761-1768 (1994). 

Kinnard, C. et a/. Reconstructed changes in Arctic sea ice over the past 1,450 
years. Nature 479, 509-512 (2011). 

Neukom, R. et al. Inter-hemispheric temperature variability over the past 
millennium. Nature Clim. Change 4, 362-367 (2014). 

This paper synthesizes paleoclimate records from the southern and northern 
hemispheres, showing one globally synchronous cool period (1594-1677) 


88. 


89. 


90. 


91. 


92. 


93. 


94. 


95. 


96. 


97. 


98. 


99. 


100. 


101. 


102. 


103. 


104. 


105. 


106. 


107. 


108. 


120. 


©2015 Macmillan Publishers Limited 


PERSPECTIVES | RESEARCH | 


and one globally synchronous warm period (1965 onwards) within the last 
1,000 years. 

Pomeranz, K. The Great Divergence: China, Europe, and the Making of the Modern 
World Economy (Princeton University Press, 2000). 

Wallerstein, |. The Modern World-System |: Capitalist Agriculture and the Origins of 
the European World-Economy in the Sixteenth Century (Academic Press, 1974). 
Killick, D. & Fenn, T. Archaeometallurgy: the study of preindustrial mining and 
metallurgy. Annu. Rev. Anthropol. 41, 559-575 (2012). 

Cooke, C. A., Balcom, P. H., Biester, H. & Wolfe, A. P. Over three millennia of 
mercury pollution in the Peruvian Andes. Proc. Nat! Acad. Sci. USA 106, 
8830-8834 (2009). 

Hong, S. M., Candelone, J. P., Patterson, C. C. & Boutron, C. F. History of ancient 
copper smelting pollution during Roman and medieval times recorded in 
Greenland ice. Science 272, 246-249 (1996). 

Rose, N. L. & Appleby, P. G. Regional applications of lake sediment dating by 
spheroidal carbonaceous particle analysis |: United Kingdom. J. Paleolimnol. 34, 
349-361 (2005). 

Snowball, |., Hounslow, M. W. & Nilsson, A. Geomagnetic and mineral magnetic 
characterization of the Anthropocene. Geol. Soc. Lond. Spec. Publ. 395, 119-141 
(2014). 

Wolfe, A. P. et al. Stratigraphic expressions of the Holocene-Anthropocene 
transition revealed in sediments from remote lakes. Earth Sci. Rev. 116, 17-34 
(2013). 

Holtgrieve, G. W. etal. A coherent signature of Anthropogenic nitrogen deposition 
to remote watersheds of the Northern Hemisphere. Science 334, 1545-1548 
(2011). 

Gatuszka, A., Migaszewski, Z. M. & Zalasiewicz, J. Assessing the Anthropocene 
with geochemical methods. Geol. Soc. Lond. Spec. Publ. 395, 221-238 

(2014). 

Falkowski, P. et al. The global carbon cycle: a test of our knowledge of Earth as a 
system. Science 290, 291-296 (2000). 

Fairchild, |. J. & Frisia, S. Definition of the Anthropocene: a view from the 
underworld. Geol. Soc. Lond. Spec. Publ. 395, 239-254 (2014). 

Hua, Q. Radiocarbon: a chronological tool for the recent past. Quat. Geochronol. 4, 
378-390 (2009). 

Harnisch, J. & Eisenhauer, A. Natural CF, and SF, on Earth. Geophys. Res. Lett. 25, 
2401-2404 (1998). 

Butler, J. H. et a/. A record of atmospheric halocarbons during the twentieth 
century from polar firn air. Nature 399, 749-755 (1999). 

Rakowski, A. Z. et al. Radiocarbon method in environmental monitoring of CO. 
emission. Nucl. Instrum. Methods Phys. Res. B 294, 503-507 (2013). 

Ketterer, M. E. et al. Resolving global versus local/regional Pu sources in the 
environment using sector ICP-MS. J. Anal. At. Spectrom. 19, 241-245 (2004). 
Fehn, U. et a/. Determination of natural and anthropogenic |-129 in marine 
sediments. Geophys. Res. Lett. 13, 137-139 (1986). 

Hansen, V., Roos, P., Aldahan, A., Hou, X. & Possnert, G. Partition of iodine (I-129 
and I-127) isotopes in soils and marine sediments. J. Environ. Radioact. 102, 
1096-1104 (2011). 

Schurer, A. P., Hegerl, G.C., Mann, M. E,, Tett, S. F. B. & Phipps, S. J. Separating 
forced from chaotic climate variability over the past millennium. J. Clim. 26, 
6954-6973 (2013). 
Steffen, W., Broadgate, W., Deutsch, L., Gaffney, O. & Ludwig, C. The trajectory of 
the Anthropocene: the Great Acceleration. Anthropocene Rev. http://dx.doi.org/ 
10.1177/2053019614564785 (in the press). 


. Zalasiewicz, J. et al. When did the Anthropocene begin? A mid-twentieth century 


boundary level is stratigraphically optimal. Quat. Int. http://dx.doi.org/10.1016/ 
j.quaint.2014.11.045 (in the press). 

van der Pluijm, B. Hello Anthropocene, goodbye Holocene. Earth’s Future 2, 
2014EF000268 (2014). 


. Wright, R.A Short History of Progress (House of Anansi Press, 2004). 


Shakun, J. D. et al. Global warming preceded by increasing carbon dioxide 
concentrations during the last deglaciation. Nature 484, 49-54 (2012). 
Monnin, E, et al. Atmospheric CO2 concentrations over the last glacial 
termination. Science 291, 112-114 (2001). 

Veres, D. etal. The Antarctic ice core chronology (AICC2012): an optimized multi- 
parameter and multi-site dating approach for the last 120 thousand years. Clim. 
Past 9, 1733-1748 (2013). 

Marcott, S.A., Shakun, J. D., Clark, P. U. & Mix, A. C. A reconstruction of regional 
and global temperature for the past 11,300 years. Science 339, 1198-1201 
(2013). 

Alexander, L.V, et al. in Climate Change 2013: The Physical Science Basis. 
Contribution of Working Group | to the Fifth Assessment Report of the 
Intergovernmental Panel on Climate Change (eds Stocker, T. F. et al.) 3-28 
(Cambridge Univ. Press, 2013). 

Indermuhle, A. et al. Holocene carbon-cycle dynamics based on COz trapped in 
ice at Taylor Dome, Antarctica. Nature 398, 121-126 (1999). 

Siegenthaler, U. et a/. Supporting evidence from the EPICA Dronning Maud Land 
ice core for atmospheric CO2 changes during the past millennium. Tellus B 57, 
51-57 (2005). 

Ahn, J. et al. CO2 diffusion in polar ice: observations from naturally formed 
COz spikes in the Siple Dome (Antarctica) ice core. J. Glaciol. 54, 685-695 
(2008). 

Marin-Spiotta, E. & Sharma, S. Carbon storage in successional and 

plantation forest soils: a tropical analysis. Glob. Ecol. Biogeogr. 22, 105-117 
(2013). 


12 MARCH 2015 | VOL 519 | NATURE | 179 


. All rights reserved 


| RESEARCH | PERSPECTIVES 


121. Bonner, M.T.L.,Schmidt, S. & Shoo, L. P. A meta-analytical global comparison of 
aboveground biomass accumulation between tropical secondary forests and 
monoculture plantations. For. Ecol. Manage. 291, 73-86 (2013). 

122. Pongratz, J., Caldeira, K., Reick, C. H. & Claussen, M. Coupled climate-carbon 
simulations indicate minor global effects of wars and epidemics on atmospheric 
CO> between Ab 800 and 1850. Holocene 21, 843-851 (2011). 

123. Orihuela-Belmonte, D. E. et a/. Carbon stocks and accumulation rates in tropical 

secondary forests at the scale of community, landscape and forest type. Agric. 

Ecosyst. Environ. 171, 72-84 (2013). 

Francey, R. J. etal, A 1000-year high precision record of 8!3C in atmospheric CO>. 

Tellus B 51, 170-193 (1999). 

125. Trudinger, C. M., Enting, |. G., Francey, R. J., Etheridge, D. M. & Rayner, P. J. Long- 
term variability in the global carbon cycle inferred from a high-precision COz and 
8'5C ice-core record. Tellus B 51, 233-248 (1999). 

126. B6éhm, F. et al. Evidence for preindustrial variations in the marine surface water 
carbonate system from coralline sponges. Geochem. Geophys. Geosyst. 3, 1-13 
(2002). 

127. Trudinger, C. M., Enting, |. G., Rayner, P. J. & Francey, R. J. Kalman filter analysis of 
ice core data—2. Double deconvolution of CO. and 8'3C measurements. 

J. Geophys. Res. D 107, D20 (2002). 


124. 


180 | NATURE | VOL 519 | 12 MARCH 2015 


Acknowledgements We acknowledge C. Hamilton, whose idea that humans are a 
reflexive power rather than force of nature was presented at the ‘Thinking the 
Anthropocene’ conference in Paris on 15 November 2013, and used with permission. 
We thank J. Kaplan and K. Krumhardt for the estimates of the population of the 
Americas, M. Irving for assistance with the figures, and C. Brierley, M.-E. Carr, 

W. Laurance, A. Mackay, O. Morton, R. Newman and C. Tzedakis for constructive 
discussion and remarks, and reviewer P. Gibbard for important comments. This work 
was funded by the European Research Council (T-FORCES, S.L.L.), a Philip Leverhulme 
Prize award (S.LL), and a Royal Society Wolfson Research Merit Award (M.A.M.). 


Author Contributions S.L.L. and M.A.M. conceived the paper structure. S.L.L. 
conceived and developed the Obris hypothesis. S.L.L. wrote the geological importance, 
historical, farming and Orbis evidence reviews. M.A.M. wrote the Pleistocene, and 
industrialization and Great Acceleration evidence reviews. M.A.M. conceived and 
developed the figures. The final two sections, written by S.L.L, emerged from 
discussions between S.L.L. and M.A.M. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to S.LLL. (s.|.lewis@ucl.ac.uk). 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


doi:10.1038/nature14279 


Quantitative evolutionary dynamics 
using high-resolution lineage tracking 
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Evolution of large asexual cell populations underlies ~30°% of deaths worldwide, including those caused by bacteria, 
fungi, parasites, and cancer. However, the dynamics underlying these evolutionary processes remain poorly understood 
because they involve many competing beneficial lineages, most of which never rise above extremely low frequencies in 
the population. To observe these normally hidden evolutionary dynamics, we constructed a sequencing-based ultra 
high-resolution lineage tracking system in Saccharomyces cerevisiae that allowed us to monitor the relative frequencies 
of ~500,000 lineages simultaneously. In contrast to some expectations, we found that the spectrum of fitness effects of 
beneficial mutations is neither exponential nor monotonic. Early adaptation is a predictable consequence of this spec- 
trum and is strikingly reproducible, but the initial small-effect mutations are soon outcompeted by rarer large-effect 
mutations that result in variability between replicates. These results suggest that early evolutionary dynamics may be 
deterministic for a period of time before stochastic effects become important. 


A major focus of biomedical research has been to identify mutations 
responsible for increased pathogenicity, cancer progression, or drug 
resistance in large evolving asexual cell populations’"”. Yet, even char- 
acterizing all mutations underlying a disease is not sufficient to understand 
its progression. Rather, a quantitative understanding of the evolution- 
ary dynamics is necessary to determine which adaptive mutations con- 
tribute significantly to driving the population fitness higher, and which 
are serendipitous or inconsequential. Mutations identified through 
genome sequencing are likely to constitute only the ‘tip of the iceberg’, 
with many beneficial mutations that impact the evolutionary dynamics 
never rising above extremely low frequencies’>™*. 

A lineage trajectory, the size of a small subpopulation of cells over 
time, can be used to discover a beneficial mutation present at an ex- 
tremely low frequency, and to measure its time of occurrence and 
selective advantage (Fig. 1a)'!°"'®. A lineage increasing in size faster than 
can be explained by stochastic drift indicates that a beneficial mutation 
has occurred and risen to a high enough frequency to grow almost de- 
terministically (that is, it has established’). Most beneficial mutations 
will drift to extinction before establishing (Supplementary Information 
section 4.1 and 4.4). For those that do establish, the exponential rate at 
which a lineage grows is a measure of the fitness effect (s) of the mu- 
tation. Extrapolating back the exponential growth, the establishment 
time (t) can be inferred: this is a rough estimate of when the mutation 
occurred” (Supplementary Information section 4.1 and 4.2). A systematic 
characterization of the distributions of s and t for beneficial mutations 
has been lacking, although these are fundamental to the evolutionary 
dynamics of large populations”. 

The major experimental challenge is developing a method to quan- 
titatively measure the trajectories of large numbers of small lineages. 
Large lineages will accumulate multiple beneficial mutations contem- 
poraneously, confounding measurements of s and t (Fig. 1a, multiple 
mutations, Supplementary Information section 4.5). Small lineages are 
unlikely to acquire a beneficial mutation at all, so many trajectories must 
be observed to characterize the distributions of s and t. DNA barcodes 
offer a powerful way to simultaneously track multiple lineages”, yet 


technical barriers have limited the number of barcodes that can be in- 
serted into cells**. Here we constructed a system capable of inserting 
~500,000 random DNA barcodes into an initially clonal yeast popu- 
lation. Using this system in populations of ~10° cells growing in a 
defined glucose-limited minimal medium, we identified ~25,000 lineages 
that gained a beneficial mutation within ~168 generations, measured 
s and t for each, and determined the spectrum of mutation rates to 
each fitness effect. This spectrum results in a deterministic increase in 
the mean population fitness early, with stochastic events governing its 
trajectory later. 


Lineage tracking with random barcodes 


We generated yeast lineages by inserting a random 20-nucleotide bar- 
code at a single location in the genome (Fig. 1b, Supplementary Infor- 
mation section 1.3). To achieve a large number of integration events, 
we inserted a ‘landing pad’ into a neutral location in the yeast genome 
that allows for high-frequency, site-specific genomic integration of plas- 
mids via the Cre-loxP recombination system**”®. A plasmid library 
containing ~500,000 random barcodes was inserted into the genome 
at the landing pad. Barcoding requires ~ 48 generations of growth from 
a common ancestor (Extended Data Fig. 1). Adaptive mutations begin 
to occur during this initial growth and can be carried forward into the 
evolution. 

The same barcoded yeast library was evolved in replicate experiments 
(El and E2) for ~168 generations in serial batch culture, diluting 1:250 
every ~8 generations, with a bottleneck population size of ~7 X 10” 
(Extended Data Fig. 1, Supplementary Information section 4.4). To count 
the relative frequency of each lineage across time, we isolated genomic 
DNA from the pooled population, amplified lineage tags using a two- 
step PCR protocol, and sequenced amplicons (Fig. 1b, Supplementary 
Information section 1.5 and 5.2). 

Plotting the relative frequency of each barcode over ~168 genera- 
tions shows a reproducible pattern of population dynamics across rep- 
licates (Fig. 2a and Extended Data Fig. 2a). Most lineages declined in 
frequency (blue lines, neutral lineages), but a modest fraction (~5%, 
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Figure 1 | Lineage tracking with random 
barcodes. a, Typical lineage trajectories. A small 
lineage that does not acquire a beneficial mutation 
(neutral, blue) will fluctuate in size due to drift 
before eventually being outcompeted. Rarely, a 
lineage will acquire a beneficial mutation (star) 
with a fitness effect of s (adaptive, red). In most 
cases, this beneficial mutation is lost to drift. If the 
beneficial mutants drift to a size >~1/s (lower 
dotted horizontal line), the lineage will begin to 
grow exponentially at a rate s. Extrapolating the 
exponential growth to the time at which the 
mutation is inferred to have reach a size ~1/s yields 
the establishment time (t, dashed vertical line) 
which roughly corresponds to the time when the 
mutation occurred with an uncertainty of ~1/s. At 
sizes > ~1/U, (upper dotted horizontal line), 


n where U, is the total beneficial mutation rate, the 
: : : lineage will acquire additional beneficial 
a Random primer library F mutations. b, Barcode insertion and sequencing. 
— = 6S oO Growth { Left, sequences containing random 20 nucleotide 
C) oe 6 barcodes (colours) are inserted first into a plasmid 
a_i - and then into a specific location in the genome. 
Ligate 1 Bottom, recombination between two partially 
Yeast Plasmid library crippled loxP sites (loxP*) integrates the plasmid 
‘landing pad’ into the genome and completes a URA3 selectable 
Extract genomic DNA Y marker, resulting in one functional and one 
PCR amplify with random primers crippled loxP site (loxP**). The URA3 marker is 
4 interrupted by an artificial intron containing the 
PC barcode. Right, to measure relative fitness, cells are 
Transform 7 <¢ passed through growth-bottleneck cycles of ~8 
Select on SC galactose -uracil | , ad 7 4 generations. Before each bottleneck, genomic DNA 
, aaa is extracted, lineage barcode tags are amplified 
: y using a two-step PCR protocol, and amplicons are 
Yeast library Sequence | sequenced. By inserting unique molecular 


identifiers” (also short random barcodes, grey 
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see below) had acquired a beneficial mutation that established (red lines, 
adaptive lineages). At later time points, the growth of adaptive lineages 
attenuates as the population mean fitness increases (clonal interference)”. 

To calculate the probability that a lineage contains an adaptive mu- 
tation, one must differentiate between a trajectory that increases due 
to an adaptive event from one that increases due to genetic drift and 
measurement errors. Because either scenario is rare, the right-hand tail 
of the distribution of read numbers is particularly important. Thus, we 
characterized the full distribution of noise that results from drift and 
sampling errors due to DNA isolation, amplification and sequencing, 
(black curve, Extended Data Fig. 2b, Supplementary Information sec- 
tion 5). The decline in frequency of neutral lineages is used to infer the 
increase in mean fitness of the population* (Fig. 2a and Extended Data 
Fig. 2a, b). 
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Using our estimates of noise and the mean fitness, we calculate the 
probability that a trajectory is explained by a mutation with fitness effect, 
s, having an establishment time, t, over a broad range of s and t (under 
a uniform prior in t and an exponential prior in s, Extended Data 2c). 
If this exceeds the probability that no beneficial mutation occurs, we 
define the lineage as adaptive, with the peak of the probability our best 
estimate of s and t (Supplementary Information section 7). Estimates 
of s and t for each adaptive lineage are combined to calculate a second 
measurement of the increase in mean fitness (Fig. 2a and Extended Data 
Fig. 2a, insets). Our two methods to infer mean fitness agree, indicating 
that most lineages driving the mean fitness have been detected. Uncer- 
tainties in s and t depend on the specific lineage trajectory; however, 
they are generally low (As + 0.5%, At + 10 generations, Supplementary 
Information section 7.7). 
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(fluorescence-based assay) 


To validate estimates of s and t, we first analysed a simulated data 
set with comparable levels of noise to our experiment (Supplementary 
Information section 12). We find a strong correlation between the known 
and inferred values for both s (R? = 0.88 in Fig. 2b) and t (R? = 0.93 in 
Fig. 2c). Second, we picked 33 clones from generation 88 that belong to 
different adaptive lineages and performed pairwise competitive fitness 
assays on each (Supplementary Information section 2). We find a strong 
correlation between these two methods (R* = 0.81, Fig. 2d). Outliers 
(lighter coloured data points) are likely due to a neutral cell being sam- 
pled from a lineage containing mostly adaptive cells. Other deviations 
could be due to interactions between adaptive lineages (that is, frequency 
dependent fitness) or to multiple mutations on the same genome (Sup- 
plementary Information section 8). 

In total, ~25,000 beneficial mutations with a fitness effect of >2% 
established before generation 112 in El (Fig. 3a), a number that is roughly 
consistent with E2 (Extended Data Fig. 3a) and simulated data (Sup- 
plementary Figs 44 and 45 and Supplementary Information section 12). 
Adaptation occurs quickly: by generation 112 the population mean fit- 
ness is over 5% higher than the ancestor, with some lineages having a 
fitness advantage of > 10%. El and E2 share 48 generations of common 
growth. During this time, ~6,000 lineages acquire a beneficial muta- 
tion that is sampled into, and establishes in, both replicates (Fig. 3a and 
Extended Data Fig. 3a, purple circles). We define these mutations as 
‘pre-existing’: their presence is not an artefact of our experiment, but a 
general expectation for large populations grown from a single cell. 


Beneficial mutation rates 


To estimate the spectrum of beneficial mutation rates in the serial batch 
conditions, we consider only lineages that are identified as adaptive in 
one replicate but not the other (that is, are unlikely to contain muta- 
tions that occurred before barcoding, Supplementary Information sec- 
tion 9 and 10). Analysing the total number of cells with each s yields the 
best estimate of the mutation rate spectrum (Fig. 3a and Extended Data 


Fig. 3a, insets, and Supplementary Information section 11.1). These esti- 
mates are worse for fitness effects that have only occurred a few times. We 
find that beneficial mutations with s > 5% occur at a rate of ~1 X 10° © 
per cell per generation (Supplementary Information section 11.2, Fig. 3a 
and Extended Data Fig. 3a, insets), a rate that is consistent across repli- 
cates. Using a fluctuation test**”’, we find that the ancestor to our bar- 
coded strains has a spontaneous mutation rate in non-repeat regions 
of ~4 X10 '° per nucleotide per generation (Supplementary Infor- 
mation section 1.7)*°*'. This implies that mutations in ~0.04% of the 
genome, ~5,000 bases, confer beneficial fitness effects of >5%. This 
target size is broadly consistent with previous reports***’, although it 
will certainly depend on the selective conditions. The beneficial muta- 
tion rate includes all events that have a heritable effect on fitness, and 
could include point mutations, indels, large genomic rearrangements 
or duplications, whole-genome duplications, and possibly even herita- 
ble epigenetic modifications. Reported beneficial mutation rates depend 
on the range of fitness effects that can establish and be detected. For 
example, if we include lower fitness effect mutations that are mostly 
pre-existing (2% < s < 5%), we find a beneficial mutation rate that is 
~50% higher. However, as we discuss below, the total beneficial muta- 
tion rate is not necessary for a predictive understanding of the evolu- 
tionary dynamics. Instead, knowledge of the rate of mutation to the 
range of fitness effects that drive the dynamics is what is needed. 


Mutation rate spectrum 

Several authors have used extreme value theory to predict that the spec- 
trum of beneficial mutation rates is exponential**”*, with some experi- 
ments that sample small numbers of beneficial mutations supporting’”*°” 
or contradicting” this prediction. We do not find support for an expo- 
nential or even a monotonically decreasing distribution. Rather, most 
mutations we observe are confined to a narrow range of fitness effects 
(2% <s<5%). At larger fitnesses, the distribution is relatively flat with 
two slight peaks in the fitness ranges 7-8% and 10-11%, a feature that is 
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Figure 3 | Fitness effects, establishment times, and population dynamics. 
a, Scatter plot of t and s of all ~25,000 beneficial mutations (circles) identified 
in El. Circle area represents the size of the lineage at generation 88. Purple 
circles indicate lineages with mutations that occurred in the period of common 
growth (t <0) that were sampled into, and established in, El and E2. Green 
circles indicate lineages that were identified as adaptive in only one replicate 
and likely contain mutations that arose after t = 0. Lines indicate the time limits 
before which mutations must occur in order to establish (large dash) or be 
observed (small dash). These limits trail the mean fitness (solid line) by ~1/s 
generations. Inset, the spectrum of mutation rates, |1(s), as a function of fitness 


consistent across replicates (Fig. 3a and Extended Data Fig. 3a, insets). 
Mutation rates to these two peaks are consistent with genomic target 
sizes of loss-of-function mutations for a single gene (~300 base pairs*’); 
these have previously been shown to be adaptive in yeast grown in 
simple environments'**°. Weaker effect mutations (s < 2%), which are 
hard to detect because they are rapidly outcompeted before establish- 
ing, do not occur at high enough rates to impact the population dy- 
namics (Supplementary Information section 9.3). 


Distribution of establishment times 


For mutations that establish, t roughly corresponds to the time at which 
a beneficial mutation occurred, with an uncertainty of a few times 1/s 
due to variability in initial stochastic drift (Supplementary Information 
section 4.1). Establishment times are broadly distributed (—90 <1< 
48). Lineages containing beneficial mutations with very negative t 
(—90 <t<-— 40) are usually identified as adaptive in both replicates 
(Fig. 3a and Extended Data Fig. 3a, purple). Establishment times as neg- 
ative as —90 generations are expected” because of beneficial mutations 
that occur during the period of common growth (t < 0, Supplementary 
Information section 10.1). Indeed the number of pre-existing beneficial 
mutations is broadly consistent with the beneficial mutation rate we 
infer (Supplementary Information section 10). We observe very few 
mutations with t > 48 for the reasons that follow. 

A beneficial mutation with a fitness effect s, that occurs in genera- 
tion t will typically take another ~ 1/s generations to reach a size large 
enough to grow exponentially””. If before this time the mean fitness has 
increased by more than s, the mutation will decline in frequency and 
never grow exponentially. Thus, there is a time limit after which a ben- 
eficial mutation that occurs is unlikely to establish (Fig. 3a and Extended 
Data Fig. 3a, larger dashed lines). This time limit is shorter for smaller s 
for two reasons: (1) small s mutations must drift to higher numbers in 
order to establish, and (2) the mean fitness of the population surpasses 
its fitness advantage in a shorter time. A mutation with s < 2% is there- 
fore extremely unlikely to establish because this limit is reached quickly. 
Thus, a fundamental lower limit on which fitness-effects can establish 
emerges from the population dynamics. 

In addition to establishing, beneficial mutations in our assay must 
also grow to a large enough number to be detectable above the number 
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effect, s inferred from mutations that likely occurred after t = 0 (Supplementary 
Information section 10.2). The y axis is the mutation rate density, so the 
mutation rate to a range, As, is obtained by multiplying this density by As. The 
total beneficial mutation rate to s > 5% is inferred to be ~1 X 10 © and is 
consistent across replicates. The observed spectrum is not exponential (grey 
line, with the error range shaded). b, the distribution of the number of adaptive 
cells binned by their fitness over time. As the mean fitness (grey curtain) 
surpasses the fitness of a subpopulation, cells with that fitness begin to 
decline in frequency. 


of neutral (ancestral) cells remaining in its lineage. This shortens the time 
window in which a beneficial mutation must occur to be observed (Fig. 3a 
and Extended Data Fig. 3a, smaller dashed lines and Supplementary 
Information section 9). Beneficial mutations we are unable to detect 
(those occurring close to, or after, the time limit) never reach sizes much 
above their establishment number (1/s), are rapidly outcompeted, and 
typically go extinct. Such mutations are unlikely to have a significant 
impact on the population dynamics. Deleterious mutations are largely 
irrelevant here: given the mean fitness increases by a few percent in 
~80 generations, a deleterious mutation will not rise to high frequency 
unless it occurs contemporaneously in a cell with a large beneficial mu- 
tation, and even then is unlikely to reach high frequencies”. 


Overall population dynamics 

Plotting the fitness distribution of all adaptive cells over time reveals 
that massive clonal interference underlies the population dynamics (Fig. 3b 
and Extended Data Fig. 3b). Many beneficial mutations (~20,000 
observed in El, ~11,000 observed in E2) of small s (2% <s< 5%, the 
‘low fitness class’) drive the mean fitness early (t < 72), but begin to be 
outcompeted by cells with larger s (~ 10%) that stem from fewer ben- 
eficial mutations (~5,000 in El and ~3,000 in E2). For the first ~80 
generations the mean fitness trajectory in both replicates is strikingly 
similar (grey curtain, Fig. 3b and Extended Data Fig 3b and Supplemen- 
tary Information section 6.5). However, by ~112 generations, the mean 
fitness is being driven by ~100 of the most beneficial mutations (s > 
10%). Because mutations to these higher fitness effects are rare, they 
display stochastic establishment times that lead to differences in the 
mean fitness between the two replicates at late times (Supplementary 
Information section 6.5). In E2, these higher fitness mutations happen 
to establish earlier, contributing to a quicker decline in the low fitness 
class, and fewer observed adaptive lineages overall. By generation ~ 132, 
we observe that the low fitness class has shrunk to a small fraction of 
the population. This, however, does not mean that cells in this class are 
inconsequential: they prevent mutations with even smaller s from es- 
tablishing. Because they are so numerous early in the evolution, some 
cells in this class are likely to accumulate additional beneficial muta- 
tions whose expansion could enable them to eventually outcompete cells 
that initially acquired higher s mutations. 
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Fitness effects that drive the early evolutionary dynamics in this large 
population are a predictable consequence of the population size and 
spectrum of mutation rates. The range of s at the highest frequency at 
time ¢ (those that are dominating the increase in mean fitness) are those 
that maximize st + log((s)), with [1(s) being the mutation rate to s (Sup- 
plementary Information section, 11.1). That is, the most important 
fitness effects at a given time are determined by a balance between being 
sufficiently probable to have established multiple times and sufficiently 
fit to have grown to large cell numbers. 

Adaptive lineages that accumulate an additional beneficial mutation 
ina cell with an existing beneficial mutation (double mutants), can im- 
pact the dynamics. However, double mutants are rare before ~ 168 gen- 
erations because most single mutants are not yet present at high enough 
cell numbers to acquire a second mutation that establishes. We estim- 
ate that fewer than ~50 of the inferred values of s and t are impacted by 
double mutants (~0.2% of all adaptive lineages, Supplementary Infor- 
mation section 4.5). Ecological changes in the environment caused by 
mutants can result in frequency-dependent selection and impact the 
evolutionary dynamics. But, over the time range used to infer fitnesses 
(up to ~ 100 generations) our observations are consistent with the sim- 
plifying assumption that beneficial mutations have frequency-independent 
fitness effects and thus subpopulations only interact via competition 
against the mean fitness (Fig. 2a and Extended Data Fig. 2a, insets). 


Discussion 

Tracking a large number of small lineages provides a granular view of 
evolutionary dynamics that is not possible by other methods’’. By 
focusing on sequencing just 0.002% of the genome, we gain almost five 
orders of magnitude in frequency resolution over genome sequencing 
approaches. This enables us to identify tens of thousands of independ- 
ent beneficial mutations, some of which never reach frequencies above 
~10-°. By contrast, our previous population sequencing approach’, 
which detected mutations at frequencies above ~ 1%, would have iden- 
tified only ~15 adaptive lineages in this study (Fig. 4, Supplementary 
Information section 9.4). Furthermore, barcode tracking yields estimates 
of the fitness effects and occurrence times for all changes that convey 
substantial fitness advantage, whether or not they are amenable to being 
identified via genome sequencing. 

Our results show that in an asexually evolving population of ~10° 
cells, a large number of independent beneficial mutations drive adap- 
tation. While individually each mutation is rare and occurs stochast- 
ically, collectively they have a predictable impact on the population 
dynamics. In large populations therefore, the early evolutionary dy- 
namics is almost deterministic: it only becomes stochastic when muta- 
tions so rare that they have occurred only a handful of times, or multiple 
mutations on the same genome, expand to an appreciable fraction of 
the population. Mutations with certain fitness effects play a far more 
important role in driving the dynamics than others, resulting in a sub- 
tle interplay between deterministic and stochastic effects. 
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Figure 4 | The need for high frequency resolution. The fitness spectrum of 
adaptive lineages in replicate E1 that could be identified within the first 100 
generations at different frequency resolution thresholds. 
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High-resolution lineage tracking is a powerful tool to study many 
questions important to evolution. Using this system across many envi- 
ronmental regimes, perhaps for longer periods of time than in this work, 
the relationships between adaptation rate, environment, and ecology 
could be quantitatively studied. A potential limitation of lineage track- 
ing is that barcode diversity will always diminish over time. However, 
the possibility of adding barcodes at different times over the course of 
an evolution could provide a means to overcome this. 

Cancer and microbial infections can have population sizes up to 
~10'* cells in a single individual, suggesting that massive clonal inter- 
ference and complex population dynamics are likely to characterize 
disease progression and drug resistance*'**. Although mutations that 
rise to high frequencies are often emphasized, much larger numbers of 
low frequency mutations could be at least as important for disease pro- 
gression or drug resistance. To study these low-frequency mutations, 
barcode tracking could be implemented in pathogenic microbes, cancer 
cell lines, or even animal tumour models*™. Indeed, lineage tracking 
has the potential to identify the treatment regimes that most effectively 
slow the rate of adaptation. By randomly picking clones and sequen- 
cing their barcodes, one can cheaply identify many clones belonging to 
independent adaptive lineages. By sequencing the genomes of these 
clones, the mutational determinants for a broad range of beneficial fit- 
ness effects can be discovered. In combination with whole genome 
sequencing, lineage tracking therefore offers a powerful method by 
which to characterize the mutational spectrum underlying evolution, 
disease progression and drug resistance. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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trajectory from replicate E2. b, The distribution of lineage sizes over time, trajectory of this lineage in El (unadaptive, blue circles) and E2 (adaptive, 
for lineages that begin with ~100 + 2 cells (vertical line). Adaptive lineages red circles) compared with the predicted trajectory with largest probability in 
(red) begin to expand above the neutral expectation (black curve) and push El (blue line) and E2 (red line). 
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Extended Data Figure 3 | Fitness effects and establishment times for 
replicate E2. a, Scatter plot of t and s of all ~14,000 beneficial mutations 
(circles) identified in E2. Circle area represents the size of the lineage at 
generation 88. Purple circles indicate lineages with mutations that occurred 
in the period of common growth (t < 0) that were sampled into, and established 
in, El and E2. Green circles indicate lineages that were identified as adaptive in 
only one replicate and likely contain mutations that arose after t = 0. Lines 
indicate the time limits before which mutations must occur in order to establish 
(large dash) or be observed (small dash). These limits trail the mean fitness 
(solid line) by ~1/s generations. Inset, the spectrum of mutation rates, |1(s), as a 


Fitness, s 


function of fitness effect, s inferred from mutations that likely occurred after 
t = 0 (Supplementary Information section 10.2). The y axis is the mutation 
rate density, so the mutation rate to a range, As, is obtained by multiplying 
this by As. The total beneficial mutation rate to s > 5% is inferred to be 
~1X 10 ° and is consistent across replicates. The observed spectrum is not 
exponential (grey line, with the error range shaded). b, The distribution of 
the number of adaptive cells binned by their fitness over time. As the mean 
fitness (grey curtain) surpasses the fitness of a subpopulation, cells with that 
fitness begin to decline in frequency. 
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Notum deacylates Wnt proteins to 
suppress signalling activity 


Satoshi Kakugawa'*, Paul F. Langton'*, Matthias Zebisch?+*, Steven A. Howell!, Tao-Hsin Chang”, Yan Liu’, Ten Feizi’, 
Ganka Bineva‘, Nicola O’Reilly*, Ambrosius P. Snijders’, E. Yvonne Jones? & Jean-Paul Vincent! 


Signalling by Wnt proteins is finely balanced to ensure normal development and tissue homeostasis while avoiding 
diseases such as cancer. This is achieved in part by Notum, a highly conserved secreted feedback antagonist. Notum has 
been thought to act as a phospholipase, shedding glypicans and associated Wnt proteins from the cell surface. However, 
this view fails to explain specificity, as glypicans bind many extracellular ligands. Here we provide genetic evidence in 
Drosophila that Notum requires glypicans to suppress Wnt signalling, but does not cleave their glycophosphatidylinositol 
anchor. Structural analyses reveal glycosaminoglycan binding sites on Notum, which probably help Notum to co-localize 
with Wnt proteins. They also identify, at the active site of human and Drosophila Notum, a large hydrophobic pocket 
that accommodates palmitoleate. Kinetic and mass spectrometric analyses of human proteins show that Notum is a 
carboxylesterase that removes an essential palmitoleate moiety from Wnt proteins and thus constitutes the first known 


extracellular protein deacylase. 


Negative feedback characterizes biological signalling’ and although often 
cell-intrinsic, is also mediated by secreted proteins. Cell- and non-cell- 
autonomous feedbacks modulate signal transduction by Wnt proteins, 
a class of secreted proteins characterized by the presence of palmitoleic 
acid appended on a conserved serine”*. This palmitoleic acid moiety is 
essential for signalling***, contributing to interaction with Frizzled re- 
ceptors**”. Canonical Wnt signalling triggers expression of intracellular, 
extracellular and membrane-localized inhibitors of the pathway. Secreted 
inhibitors include Dickkopf (Dkk) family members, which bind to the 
extracellular domain of the Wnt co-receptor low-density-lipoprotein- 
receptor-related protein 5/6 (Lrp5/6), as well as Wnt inhibitory factor 
1 (Wif1) and secreted Frizzled receptor proteins (Sfrp), which seques- 
ter Wnt proteins®. Tiki is a membrane-bound protease that cleaves the 
amino-terminal region of Wnt ligands’. Notum is also thought to act 
enzymatically'®”* but on glypicans, a class of heparan sulfate proteo- 
glycans (HSPGs) implicated in the extracellular stabilization, movement, 
and/or surface retention of Wnt proteins, as well as of other signalling 
ligands’*-"*. 

Notum orthologues are found in metazoans from planarians to 
humans and all bear the hallmark Ser-His-Asp catalytic triad of 
a/B-hydrolases’®"". The sequence similarity of Notum to plant pectin 
acetylesterases prompted the early suggestion that it could hydrolyse 
glycosaminoglycan (GAG) chains of glypicans’*”’, thus affecting their 
ability to interact with Wnt ligands and somehow modulating signal- 
ling activity. It was subsequently reported that Notum triggers glypican 
shedding from cultured cells, perhaps by cleaving their glycosylpho- 
sphatidylinositol (GPI) anchor'*”’. Indeed, the currently accepted view 
is that Notum is a glypican-specific phospholipase’’. However, glypican- 
based interactions also modulate Dpp (Drosophila TGF-B), Hedgehog, 
and fibroblast growth factor, as well as Wingless signalling” *. One 
would expect therefore that these pathways would also be sensitive to 
Notum-induced glypican release. Yet, existing evidence suggests that 
Notum is primarily a feedback inhibitor of Wnt signalling. In planarian 


worms, Drosophila, zebrafish and hepatocarcinomas, notum express- 
ion is activated by Wnt signalling and, conversely, Notum seems to pre- 
ferentially suppress Wnt signalling’®''**!. Because more pleiotropic 
effects would be expected from an enzyme that targets glypicans, we 
felt compelled to reassess Notum’s specificity and mode of action. 


Notum specifically inhibits Wnt signalling 

To investigate the specificity of Notum systematically, we analysed its 
effects on Drosophila wing imaginal discs, which require Wingless (the 
main Drosophila Wnt), Dpp and Hedgehog for patterning and growth”. 
As expected, overexpression of Drosophila Notum (dNotum) through- 
out the dorsal compartment prevented expression of senseless, a gene 
normally activated by high level Wingless signalling. By contrast, patched 
(ptc), a Hedgehog target gene”, was unaffected (Fig. 1a, b) and phospho- 
Mad immunoreactivity, a marker of Dpp signalling”, was only mildly 
reduced (Extended Data Fig. 1a, b). Loss-of-function assays, in homo- 
zygous notum knockout (notum*®) tissue, confirmed the specificity 
of dNotum to Wingless signalling (Fig. 1c and Extended Data Fig. Ic). 
Although complete loss of notum was lethal, strong hypomorphic animals 
(notum!/notum®®) survived to adulthood. The wings of such animals 
had supernumerary margin bristles, consistent with excess Wingless 
signalling, but had no defects indicative of impaired Hedgehog or Dpp 
signalling (Extended Data Fig. 1d—g). Nevertheless, extensive evidence 
suggests that glypicans contribute to these two signalling pathways’”~°. 
This is difficult to reconcile with the apparent specificity of Notum if it 
acts as a glypican-specific phospholipase. 


Notum does not cleave the GPI anchor of glypicans 

One previously reported observation, namely that dNotum inhibits sig- 
nalling by membrane-tethered (that is, shedding-resistant) Wingless"! 
(Extended Data Fig. 2a, b), is incompatible with the view that Notum is 
a glypican-specific phospholipase. In addition, genetic removal of the 
two Drosophila glypicans Dally and Dally-like protein (Dlp) did not 
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Figure 1 | Notum specifically inhibits Wnt signalling. a, b, Overexpression 
of V5-tagged dNotum (Notum-V5) with the apterous-Gal4 driver, which is 
expressed in the dorsal compartment prevents expression of Senseless (Sens) 
but not that of Patched (Ptc) (b). c, Loss of notum activity, achieved by 
generating large patches of notum*° tissue (see Methods), marked by the loss of 
green fluorescent protein (GFP), leads to broadening of Senseless expression 
but does not affect Patched expression. As in all subsequent confocal images, 
third instar wing imaginal discs are shown with posterior to the right and dorsal 
up. d, Senseless is expressed seemingly normally in large patches of dlp dally 
mutant cells (GFP-negative). e, Western blot (co-stained with anti-V5 and 
anti-haemagglutinin (HA)) of phase-separated extracts of S2 cells transfected 
with a plasmid expressing HA-tagged Dlp (Dlp-HA). In control extracts, DIp 
(arrowhead) is found equally in the detergent (D) and aqueous (A) phases. 
Coexpression of dNotum-V5 (asterisk) had no impact while treatment with 
PIPLC shifted Dlp to the aqueous phase. 


abrogate high level Wingless signalling (Fig. 1d). These two sets of data 
strongly suggest that glypican shedding is unlikely to account for the 
inhibitory effect of dNotum on Wingless signalling. Indeed, we could 
not reproduce the results ofan earlier phase partition assay, which sug- 
gested that Notum increases the water solubility of glypicans, as expected 
from GPI cleavage’. Extracts from cells expressing tagged Dp or Dally 
were treated with either dNotum or bacterial phosphoinositide phos- 
pholipase C (PIPLC), an enzyme known to cleave GPI anchors. PIPLC 
caused both glypicans to partition almost exclusively in the aqueous 
phase, but dNotum did not (Fig. le for Dlp; Extended Data Fig. 2c for 
Dally), even though, under these conditions, it was effective at inhibiting 
signalling (Extended Data Fig. 2d). Likewise, in imaginal discs, Notum 
did not mimic PIPLC: whereas extracellular Dlp and Dally were notice- 
ably reduced after addition of exogenous PIPLC, overexpression of 
dNotum had no such effect (Extended Data Fig. 2e-l). Therefore, exper- 
iments with cultured cells and imaginal discs suggest that Notum is not 
a glypican-specific phospholipase. 


Glypicans contribute to the activity of Notum 


Although dNotum does not seem to modulate Wingless signalling by 
cleaving the GPI anchor of glypicans, genetic interactions between 
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notum and dlp suggest a functional relationship’'”’. We therefore inves- 
tigated the role of Dlp or Dally in the ability of dNotum to suppress Wing- 
less signalling. dNotum overexpression, along the anterior—posterior 
(A-P) boundary, led to complete and long-range suppression of Senseless 
expression (Fig. 2a). In the absence of either Dlp or Dally, this activity 
was very much reduced, as indicated by the recovery of endogenous 
Senseless expression (Fig. 2b, c). Notably, Dally was also required for 
Notum to suppress signalling by membrane-tethered Wingless (Extended 
Data Fig. 3a). Because Dally is not essential for survival, its requirement 
for Notum’s ability to suppress Wingless signaling could be confirmed 
in adult wings (Extended Data Fig. 3b-d). To address the relevance of 
glypican GPI anchorage, we created a transgene expressing DIp-CD8 
(34 carboxy-terminal amino acids of Dlp replaced by the CD8 trans- 
membrane domain) under control of the tubulin promoter. This trans- 
gene restored the ability of overexpressed dNotum to repress Wingless 
signalling in dlp mutant homozygotes (Fig. 2d; compare to Fig. 2b), 
confirming the importance of glypicans but not their GPI anchor. 
Glypicans bear sulfated glycans. In Drosophila, sulfation of the sugar 
chains requires Sulfateless, a GlcNAc N-deacetylase/N-sulfotransferase 
(NDST)*3, which can be knocked down in vivo with an RNA interfer- 
ence (RNAi) transgene. Gal4 was used to express this transgene specif- 
ically in the posterior compartment, leaving the anterior compartment 
as a control. At the same time, a dpp-LexA driver was used to over- 
express dNotum along the A-P boundary. Overexpressed dNotum inhib- 
ited Senseless expression in the control compartment but not in the 
territory deficient in sulfateless activity (Fig. 2e). Therefore, sulfation 
of HSPGs is needed for dNotum to act. Notably, overexpressed 
dNotum did not accumulate in the compartment expressing the sul- 
fateless RNAi transgene (compare Fig. 2e to Fig. 2a, right panels). Like- 
wise, dNotum was depleted from the surface of dally dlp double-mutant 
cells generated by mitotic recombination (Fig. 2f). These findings sug- 
gest that Dally and Dlp retain dNotum at the cell surface through inter- 
action with their sulfated glycans. Indeed, dNotum bound specifically 
to sulfated glycans ona glycan array (Extended Data Fig. 4). In addition, 
surface plasmon resonance (SPR) showed that recombinant human (h) 
NOTUMore (Ser 81-Thr 451, Cys330Ser) bound to heparin and heparan 
sulfate with micromolar affinities. The dissociation constant of a com- 
plex comprising hNOTUM,,,. and human glypican-3 (GPC3 Pro 31- 
Asn 538) was 104 1M (Fig. 3a). Consistent with the Drosophila genetic 
data, this binding relies largely on the two sulfated GAG chains in 
GPC3 as their removal led to a more than fivefold reduction in affinity 
(Fig. 3a). We conclude that sulfated GAG chains on glypicans prob- 
ably mediate their interaction with Notum. 


Structure- guided identification of GAG-binding sites 


The above results indicate that glypicans contribute to Notum activity 
by localizing it at the cell surface, but are unlikely to be the target of 
Notum’s enzymatic activity. What could the target be? We started to 
address this question by solving the structures of hNOTUM ore (in nine 
crystal forms at resolutions between 1.4 and 2.8 A: see Supplementary 
Information) and of dNotumajoop (in two crystal forms at resolutions 
of 2.4 and 1.9 A) (Fig. 3b, Extended Data Fig. 5 and Supplementary 
Information). The structures exhibit a canonical «/$-hydrolase fold**, 
as predicted'*"". The conserved eight-stranded central f-sheet is extended 
on both sides by strands 4 and 14 and is flanked by the canonical six 
a-helices. This single domain topology is further extended by addi- 
tional a-helices, two very short B-sheets, several long loops and seven 
stabilizing disulfides. The catalytic triad comprises Ser 232, Asp 340 
and His 389 (hNOTUM residue numbering). 

Seven sulfate binding sites were identified in hNOTUM,.;. crystal 
form III (Fig. 3c and Extended Data Fig. 6). Among them, one (sulfate 1) 
was found by SPR to contribute substantially to heparin-Notum inter- 
actions (Fig. 3d). In addition, co-crystals with short heparin oligomers 
or sucrose octasulfate (SOS), a heparin mimic, were generated and ana- 
lysed. These structural studies and additional biophysical analyses (de- 
scribed in Supplementary Information and illustrated in Extended Data 


©2015 Macmillan Publishers Limited. All rights reserved 


dpp > notum-V5 


Lt 
Q 
oe 
$9 
gs 
88 
ne 
Q 
g 


dpp > notum-V5 
en > sulf-RNAi 


ptc > notum-V5 
dip dally clones 


dpp > notum-V5, dally“ dpp > notum-V5, dip~- 


Fig. 6) delineated an extensive GAG-binding patch centred on a basic 
groove between the top of the B-sheet and helix «K (Fig. 3c). Impor- 
tantly, the GAG-binding surface on Notum is distant from the cata- 
lytic triad, consistent with our earlier evidence that Notum binds to 
glypicans, but does not act on them enzymatically. 


Evidence for carboxylesterase activity 


The «/B-hydrolase superfamily includes proteases, lipases, esterases, 
dehalogenases, peroxidases and epoxide hydrolases™. To identify which 
of these activities relate most closely to the activity of the Notum pro- 
tein, we compared the structure of hNOTUM to those ofall known «/B- 
hydrolases (PDBeFold server*’). The search returned many weak 
homologues, including human esterase D** and acyl-protein thioester- 
ase 1 (APT1)*’ (Extended Data Fig. 7a). A structure-based search for 
function using the ProFunc Server*® also suggested that Notum is a 
carboxylesterase. Furthermore, the closest non-animal homologues of 
Notum, the pectin acetylesterases of angiosperms (22% sequence identity 
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Figure 2 | Notum requires the GAGs of glypicans 
to inhibit Wingless signalling. a—c, Ectopically 
expressed dNotum-V5 does not suppress Senseless 
expression in the absence of dlp (b) or dally (c). 
Although dNotum is only expressed in a vertical 
band along the A-P boundary, it spreads along the 
whole A-P axis. d, Ectopic dNotum represses 
Senseless expression in dlp mutants 

that express Dlp-CD8 (tubulin promoter). 

e, Expression of an RNAi transgene against 
sulfateless (sulf-RNAi) in the posterior 
compartment prevents dNotum-V5 (expressed 
from dpp-lexA lex-op-notum-V5) from being 
retained at the cell surface and from suppressing 
Senseless expression. Wingless signalling is still 
suppressed in the anterior compartment. 

f, Accumulation of dNotum-V5 is reduced 

at the surface of dlp dally double-mutant tissue 
(GFP-negative). 


to hNOTUM, Extended Data Fig. 7b) are carboxylesterases. We assessed 
the functional significance of these observations by measuring the activ- 
ity of hANOTUM,.;e on p-nitrophenyl (pNP) acetate (pNP2), a chro- 
mogenic carboxylesterase substrate”. Pronounced activity could be 
detected (Fig. 4a). This activity was strongly inhibited by Triton X-100, 
and by phenylmethanesulfonyl fluoride (PMSF), a compound known 
to covalently modify the catalytic serine of serine esterases and prote- 
ases (Extended Data Fig. 8a, b). By contrast, there was no measurable 
hNOTUM activity on representative sulfatase, phosphatase, phospho- 
lipase C or amidase/protease substrates (Fig. 4a). Addition of SOS or 
heparin resulted in a modest increase in Notum carboxylesterase activ- 
ity (Extended Data Fig. 8a). The possibility that GAGs also contribute 
to Notum function by allosteric activation requires further investigation. 

As a secreted carboxylesterase that inhibits Wnt signalling, Notum 
is likely to target a carboxy-oxoester or carboxy-thioester bond present 
onan extracellular component of the Wnt signal transduction machin- 
ery. The linkage between Wnt and palmitoleic acid is the only such 


Figure 3 | hNOTUM structure and GAG 
binding. a, Binding of hNOTUM.or¢ to 
immobilized heparin, heparan sulfate 
(HeparanSulf), hGPC3 or hGPC3,cac; assayed by 
SPR. b, Structure of hNOTUM. f-strands are 
numbered and a-helices are labelled alphabetically 
from N to C terminus (NT and CT, respectively). 
Disulfides are shown in orange, catalytic triad 
residues as sticks and the active site pocket shaded 
grey. Asn 96 is glycosylated (also in dNotum). 

c, Heparin-mimicking ligands from three different 
structures are plotted onto a surface representation 
coloured by electrostatic potential from red 
(—8k,T/e.) to blue (—8k,T/e,). Close-up views of 
binding sites are shown on the right with 
experimental omit electron density contoured at 
2.00. d, SPRassay measuring hNOTUM core variant 
binding to immobilized heparin. Mutation of 

the heparin disaccharide binding site (Arg115Ser; 
HNOTUMaptep.unit) had little effect while 
mutations in the sulfate binding site 1 (Arg409Gln, 
His412Asn and Arg416Gln; hNOTUMasuafate1) 
strongly reduced binding. For SPR (a, d), each data 
point is the mean result of two replicates. 
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Figure 4 | Enzymatic activity of ANOTUM. a, Activity of ANOTUM,o;. and 
its Ser232Ala variant on p-nitrophenyl (pNP) acetate (pNP2) and activity of 
hNOTUM ore on other chromogenic substrates. pNAA, p-nitroacetanilide 
(amidase/protease substrate); pNPP, pNP-phosphate (phosphatase substrate); 
PNPPC, pNP-phosphorylcholine (phospholipase C substrate); pNPS, 
pNP-sulfate (sulfatase substrate). b, mWnt3A inactivation by hNOTUM. After 
the indicated time (in hours), hNOTUM core or its Ser232Ala variant was 
removed with cobalt affinity beads and residual Wnt3A activity measured 
with TOPFlash. PC denotes no hNOTUM removal. Results are normalized to 
those from identically treated mock samples. c, Activity of hNOTUM and 
hAPT1 on chromogenic p-nitrophenyl ester substrates of different lengths. 

d, Inhibition of hNOTUM by various carboxylic acids. pNP8 was used as 
substrate at a concentration of 1 mM, as were the carboxylic acids. c or t denote 
cis or trans C9-C10 double bond. All graphs show the mean + s.d. (n = 4). 


chemical bond described to date, suggesting that Notum could target 
Wnt proteins themselves. To evaluate this possibility, we treated mouse 
(m)Wnt3A with recombinant hNOTUM,or. for specific durations, 
removed the hNOTUM,,,. and used a cell-based luciferase assay to mea- 
sure signalling activity of the remaining mWnt3A. This showed that 
hNOTUM inactivated mWnt3A directly, irreversibly and in a time- 
dependent manner (Fig. 4b), while no such effect could be detected on 
Norrin, a non-lipidated ligand that also acts via the Wnt receptors” 
(Extended Data Fig. 8c). 

Remarkably, the Notum crystal structures revealed a large (~380 A®), 
hydrophobic pocket adjacent to the catalytic triad (Fig. 3b, c). Com- 
putational docking showed that this pocket could accommodate long- 
chain fatty acids of up to 16 carbon atoms (C16). The size restriction 
imposed on saturated fatty acids was functionally assessed by measuring 
hNOTUM enzymatic activity on commercially available saturated 
chromogenic pNP ester substrates of varying chain lengths. The activ- 
ity of human APT1, a cytosolic thio- and oxoesterase was measured in 
parallel for comparison. HANOTUM had a pronounced preference for 
pNP8 (Fig. 4c), with a micromolar Michaelis constant (Extended Data 
Fig. 8d, e). The activity for pNP-palmitate (pNP16) was less than 0.2% 
of that for pNP8. To extend our studies of hNOTUM specificity beyond 
commercially available substrates, we used a competitive inhibition 
assay, using pNP8 as substrate. Saturated 8-12 carbon (C8-C12) long 
linear carboxylic acids inhibited activity (Fig. 4d) while longer saturated 
fatty acids had no effect. Interestingly, however, strong inhibition was 
observed with the Wnt-associated cis-unsaturated lipids myristoleic 
(C14) and palmitoleic acid (C16) (Fig. 4d and Extended Data Fig. 8f), 
but not with palmitelaidic acid, the trans isomer of the 16:1 fatty acid 
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(Fig. 4d). These results confirm that Notum can bind to C14 and C16 
carboxylic acids if they contain a C9-C10 cis double bond and there- 
fore might hydrolyse the oxo-ester bond linking palmitoleate or myr- 
istoleate to Wnt proteins. 


Notum deacylates Wnt proteins 


To test directly Notum-mediated Wnt deacylation we turned to liquid 
chromatography—mass spectrometry (LC-MS) analysis. mWnt3A 
was purified from conditioned medium, treated with recombinant 
hHNOTUM ore or a mock solution, differentially isotope labelled, and 
trypsinised. No notable identification could be obtained for the predicted 
palmitoleoylated tryptic peptide, indicating incompatibility with the 
LC-MS conditions. After treatment with hNOTUM, however, this pep- 
tide could be identified and quantified in non-acylated form (Fig. 5a, b 
and Extended Data Fig. 9a, b). Replicate LC-MS measurements and 
label reversal consistently showed an increase in signal intensity for the 
hNOTUM-treated de-acylated peptide whereas control peptides were 
largely unaffected by hNOTUM treatment (Extended Data Fig. 9c, d). 
This suggests that treatment of mWnt3A with hNOTUM removes the 
palmitoleic acid moiety thus rendering the relevant peptide more hydro- 
philic and detectable by LC-MS. Encouraged by these results, we pro- 
ceeded to assess the activity of hNOTUM on synthetic peptides. The 
predicted tryptic peptide from hWNT3A was synthesized in a disulfide- 
bonded form with a palmitoleate group on the relevant serine (Sup- 
plementary Information). These peptides were treated with recombinant 
hNOTUM ores or with hNOTUM core(S232A), which is predicted to be 
enzymatically inactive, and the reaction products were analysed by 
matrix assisted laser desorption ionization time-of-flight (MALDI-TOF). 
No significant deacylation was detected in hNOTUM core(S232A)-treated 
samples, whereas hNOTUM- treated peptides were found to be exten- 
sively deacylated (Fig. 5c and Extended Data Fig. 9e). We conclude from 
these assays that Notum catalyses the removal of palmitoleic acid, which 
is normally O-linked to Ser 209 of hWNT3A. Wealso assayed the effect 
of hNOTUM on a synthetic peptide from human Sonic Hedgehog 
(SHH), which is N-palmitoylated at the amino terminus*’. No change 
in the level of acylation could be detected (Fig. 5d and Extended Data 
Fig. 9f), confirming that the activity of Notum on Wnt is specific, in 
agreement with our genetic evidence. 

To gain structural insight into Wnt-Notum recognition, we co- 
crystallized inactive hNOTUM.,,-($232A) with a palmitoleoylated 
disulfide-bonded peptide corresponding to hWNT7A(Cys 202-Cys 209). 
The crystal structure revealed the palmitoleoyl group occupying the 
active site pocket (Fig. 5e and Extended Data Fig. 9g). Electron density 
was also evident for the ester bond. No interpretable density was found 
for the peptide, probably owing to disorder. This apparent lack of inter- 
action with the peptide concurs with the general observation that ester- 
ases/lipases of the «/$-hydrolase family bind only to the acid part of the 
ester substrate. The carboxylic acid carbon is 3.3 A from the Cf of the 
mutated serine nucleophile, a distance consistent with ideal position- 
ing of the hydroxyl for nucleophilic attack. Classically esterase-catalysed 
hydrolysis proceeds through a tetrahedral transition state characterized 
by a negatively charged carbonyl oxygen stabilized by two canonical 
backbone amides, the oxyanion hole**. In hNOTUM, the Gly 127- 
Trp 128 amide participates in formation of the oxyanion hole in addi- 
tion to the canonical Ser 232-Ala 233 and Gly 126-Gly 127 amides, 
thereby providing optimal stabilization during the transition state (Ex- 
tended Data Fig. 9g). The kinked cis double bond (C9-C10) of the acyl 
tail is positioned at the base of the pocket between Ile 291, Phe 319 and 
Phe 320. We found a similar binding mode for a hNOTUM-myristoleate 
crystal structure (Extended Data Fig. 9h). Thus, the binding pocket 
can accommodate extended carbon tails up to C8/C10 but longer fatty 
acid chains must be kinked at this point in order to fit in. Saturated 
fatty acids generally adopt an extended conformation, explaining the 
preference of Notum for palmitoleate and myristoleate (both cis- 
unsaturated lipids kinked at C9-C10) over palmitate and myristate. The 
pocket entrance (lined by Ser 232 and His 389) is relatively narrow, but 
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Figure 5 | Wnt-deacylation by Notum. a, LC-MS analysis of mWnt3A 
protein treated with hNOTUM,o;. or a mock solution. By comparison to mock 
treatment (light label), addition of hNOTUM (heavy label) caused a significant 
increase in the signal intensity of unlipidated CHGLSGSCEVK. b, LC-MS 
peak areas from a, shown as mean + s.e.m. (n = 2). ¢, d, Quantification from 
MALDI analysis of synthetic lipid-bearing peptides treated with hNOTUM ore 
or its Ser232Ala variant. Bars (grey denotes lipidated; white denotes 
delipidated) show mean + s.e.m. (n = 3). Palmitoleoylated hWNT3A peptide, 
but not palmitoylated hSHH peptide, was specifically deacylated by the 
wild-type enzyme. e, Close-up view on the seryl-palmitoleate active site 
complex of hNOTUM. The experimental omit electron density is contoured 
at 20. f, Feedback control by Notum. Notum deacylates Wnt in a 
glypican-assisted fashion. 


comparisons of all hNOTUM structures suggest substantial flexibility, 
compatible with palmitoleate entry and release (Extended Data Fig. 5b). 
Therefore, crystallographic evidence strengthens our observation that 
Notum is a Wnt-specific deacylase with preference for cis-unsaturated 
long chain lipids. 


Discussion 

Only a small number of secreted proteins, Wnts, Hedgehogs and 
Ghrelins, are known to be acylated”. In all cases, this post-translational 
modification is essential for activity and is carried out by dedicated 


ARTICLE 


membrane-bound O-acyl transferases (MBOATS). Porcupine, the Wnt 
MBOAT, appends palmitoleate and shorter cis-unsaturated fatty acids 
onto Wnt’. We have shown here that Notum specifically deacylates 
Wnt (Fig. 5f) and is thus the first enzyme known to deacylate an extracell- 
ular protein. The specificity of Notum can be traced to the shape of its 
hydrophobic pocket, which can accommodate cis-unsaturated fatty 
acids such as myristoleate and palmitoleate, and the nature of its enzy- 
matic activity, a carboxyl oxoesterase. These characteristics ensure that 
Notum preferentially acts on Wnt proteins, the only secreted proteins 
known to be O-palmitoleoylated on a serine residue. Notum enzyma- 
tically inhibits signalling activity by removing the palmitoleate moiety 
of Wnt proteins, which contributes directly to receptor binding’. Notum 
could also interfere non-catalytically with the formation of the Wnt- 
Frizzled complex by sequestering the palmitoleate moiety as overex- 
pressed dNotum(S237A) mildly suppressed Wingless signalling in vivo 
(data not shown). We have found that glypicans are required for Notum 
function and that Notum binds to the sulfated GAGs of HSPGs. Glypicans 
can have stimulatory roles in Wnt signalling**°. However, in the pres- 
ence of Notum, we suggest that glypicans are also inhibitory by acting 
as a scaffold that co-localizes Notum and its substrate (Wnts) at the cell 
surface (Fig. 5f). 

Our results point to Notum’s physiological targets being exclusively 
Wnt family members. Notum is the only secreted Wnt feedback inhi- 
bitor found across the metazoan kingdom, from planarians to humans, 
although it is seemingly absent from Caenorhabditis elegans. Notum’s 
Wnt-deacylation activity, along with other means of feedback inhibi- 
tion such as ligand sequestration, receptor blocking, receptor down- 
regulation and proteolytic degradation’** * undoubtedly contributes 
to the fine balancing of Wnt signalling both during development, for 
cell fate specification, and in adults, for example, for stem cell mainte- 
nance. Indeed, insufficient or excessive Wnt signalling has been assoc- 
iated with diseases such as neurodegeneration or cancer, respectively. 
Our binding data suggest that Notum could possibly be modulated by 
dietary cis-unsaturated fatty acids. Moreover, because Notum is an extra- 
cellular enzyme with a well-defined and large active site pocket, it is 
probably amenable to chemical inhibition to alleviate conditions asso- 
ciated with insufficient Wnt signalling. Conversely, recombinant Notum 
could be considered as a therapeutic agent to prevent excess Wnt sig- 
nalling such as in Wnt-driven cancers. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Immunostaining and microscopy. The following primary antibodies were used: 
guinea-pig anti-Senseless (1:1,000, gift from H. Bellen), mouse anti-Patched (1:50, 
Hybridoma bank), rabbit anti-V5 (1:500, Abcam), mouse anti- V5 (1:500, Invitro- 
gen), rabbit anti-p-Smad3 (1:500, Epitomics), mouse anti-Dlp (1:50, Hybridoma 
bank), rabbit anti-GFP (1:500, Abcam), mouse anti-Wingless (1:200, Hybridoma 
bank). Secondary antibodies used were Alexa 488, Alexa 555 and Alexa 647 (1:500, 
Molecular Probes). Total and extracellular immunostaining of imaginal discs was 
performed as previously described”. Imaginal discs were mounted in Vectashield 
with 4’ ,6-diamidino-2-phenylindole (DAPI; Vector Laboratories) and imaged using 
a Leica SP5 confocal microscope. Confocal images were processed with Image] 
(NIH) and Photoshop CS5.1 (Adobe). All confocal images show a single confocal 
section. Adult wings were mounted in Euparal (Fisher Scientific) and imaged with 
a Zeiss Axiophot2 microscope with an Axiocam HRC camera. Adult wing size and 
L3-L4 intervein distance was measured with ImageJ. 

Drosophila husbandry and clone induction. All crosses were performed at 25 °C 
except those to generate discs shown in Figs 1a, b, 2f and Extended Data Figs 1a, b 
and 2g, h, k, 1 in which larvae were reared at 18°C, the Gal80* permissive tem- 
perature, and then shifted to 29 °C, the restrictive temperature, 16 h before dissec- 
tion to induce UAS-notum-V5 expression. To generate mutant clones, larvae were 
heat-shocked for 1 h at 37 °C at 60 h (+12 h) after egg laying, except for the cross to 
generate the disc shown in Fig. 2f, which was heat-shocked for 1 h at 37 °C at 84h 
(+12h) after egg laying. Large mutant clones were generated by including a Minute 
mutation on the homologous chromosome”. 

Drosophila genotypes. The following Drosophila genotypes were used: Cyo / UAS- 
notum-V5 ; tub::Gal80" / + (Fig. 1a); ap-Gal4 / UAS-notum-V5 ; tub::Gal80" / + 
(Fig. 1b); yw hs-FLP ; notum®° ERT2A / Ubi::GFP M FRT2A (Fig. 1c); yw hs-FLP ; 
dally"? dipM!?° FRT2A / Ubi::GFP M FRT2A (Fig. 1d); UAS-notum-V5 / + ; 
dpp-Gal4 / + (Fig. 2a); UAS-notum-V5 / + ; dpp-Gal4 dlpM™° / dipM"° FRT2A 
(Fig. 2b); UAS-notum-V5 / + ; dpp-Gal4 dally”? / dally“? FRT2A (Fig. 2c); 
UAS-notum-V5 / tub::dlp-CD8 ; dpp-Gal4 dlp"? / dip" FRT2A (Fig. 2d); lex- 
OP-notum-V5 /en-Gal4 UAS::GEP ; dpp-lexA / UAS-sulf-RNAi (Fig. 2e); yw hs-FLP 
/ tub::Gal80" ; UAS-notum-V5 / ptc-Gal4 UAS::GEP ; dallyM""? dipM!° FRT2A / 
Ubi::GFP FRT2A (Fig. 2f); Cyo / UAS-notum-V5 ; tub::Gal80" / + (Extended 
Data Fig. 1a); ap-Gal4 / UAS-notum-V5 ; tub::Gal80" / + (Extended Data Fig. 1b); 
yw hs-FLP; notum®° FRT2A / Ubi::GFP M FRT2A (Extended Data Fig. 1c); notum!! 
Ubi::GFP FRT2A / Mkrs (Extended Data Fig. 1d); notum®® FRT2A / notum' 
Ubi::GFP FRT2A (Extended Data Fig. le); UAS-NRT-Wg / + ; dpp-Gal4 / + 
(Extended Data Fig. 2a); UAS-NRT-Wg/ + ; dpp-Gal4 / UAS-notum (Extended 
Data Fig. 2b); dally-GFP (protein-trap, DGRC 115-064) (Extended Data Fig. 2e, fi, j); 
Cyo / UAS-notum-V5 ; tub::Gal80"* / + (Extended Data Fig. 2g); ap-Gal4 / UAS- 
notum-V5 ; tub::Gal80" / + (Extended Data Fig. 2h); Cyo / UAS-notum-V5; 
tub::Gal80" / dally-GFP (Extended Data Fig. 2k); ap-Gal4 / UAS-notum-V5; 
tub::Gal80"s / dally-GFP (Extended Data Fig. 21); UAS-NRT-wg/ + ; UAS-notum 
dallyM"? / dpp-Gal4 dally"°? (Extended Data Fig. 3a); UAS-notum-V5 / UAS- 
notum-V5 (Extended Data Fig. 3b); sal-Gal4 / UAS-notum-V5 (Extended Data 
Fig. 3c); sal-Gal4/ UAS-notum-V5 ; dally"? FRT2A / dallyM"? FRT2A (Extended 
Data Fig. 3d). 

Generation of notum knockout by homologous recombination. notum*° was 
generated by homologous recombination using reagents and crossing schemes 
described previously*'. The homology arms were amplified from w'”* genomic 
DNA. The primers 5'-GATCGCTAGCCGAGAAAGACACAAACGAAGATC 
AAC-3’ and 5’-GATCGGTACCCGATTCGATTACACATAGATATAGAATA 
G-3' were used to amplify the upstream 5-kilobase (kb) homology arm, which 
was cloned into pTV as an Nhel-Kpn1 fragment. The primers 5’-GATCACT 
AGTGTTATCAAAAGCGAACGCCGCAATAC-3’ and 5'-GATCAGATCTCT 
GGAATTGATTTGATTCGATTGCGGTG-3’ were used to amplify the down- 
stream 3-kb homology arm, which was cloned into pTV as a Spel-BgllI fragment. 
notum®° deletes 82-base pair (bp) sequence of the first exon that encodes the signal 
sequence. As expected, notum*° behaved as a null. It was recombined onto FRT2A 
for clonal analysis. 

Transgene to express Dip-CD8. Dlp-CD8 was made by replacing the C terminus 
where the GPI anchor is normally added with mouse CD8 transmembrane domain 
and GFP. The primers 5’-GATGAATTCGGCGCGCCATGCTACATCAGCAG 
CAACAAC-3' and 5’-GCATGCGGCCGCCTCGATTGTCATTGGCCCCG-3’ 
were used to amplify 2,193 bp of the cDNA encoding a polypeptide lacking Asp 734, 
where GPI is normally appended. This fragment was cloned in frame as an EcoR1- 
Not1 fragment in UAS-HRP-CD8-GFP (deleting horseradish peroxidase (HRP)) 
and the Dlp-CD8-encoding fragment was then transferred to pMTV5 as an EcoR1- 
Xhol fragment. From there it was transferred to pTubulin as a Kpn1-Mlul fragment. 
This transgene rescued viability and wing patterning in dip mutant homozygotes, 
which otherwise do not survive beyond pupal stages, a strong indication that Dlp 
does not need to be GPI anchored for normal development. 
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Expression vectors for cultured Drosophila cells. Drosophila S2 or S2R+ (Droso- 
phila Genomics Resource Centre, DGRC), were cultured at 25°C in Schneider’s 
medium plus L-glutamine (Sigma) containing 10% (v/v) fetal FBS (Life Technol- 
ogies) and 0.1 mg ml! pen/strep (Life Technologies). To generate plasmids ex- 
pressing V5-tagged dNotum, the dNotum cDNA (from S. Cohen) was amplified, 
adding a V5 tag (GKPIPNPLLGLDST) at the C terminus. This fragment was then 
inserted into pActin, pUAST or pLotattB” to generate pAct-Notum-V5, pUAST- 
Notum-V5 or pLotattB-Notum-V5, respectively. A stable S2 line expressing V5- 
tagged dNotum (S2 act-Notum-V5) was generated by transfection of S2 cells with 
pAct-Notum-V5 and pCoHygro (Invitrogen) followed by drug selection. Wingless 
was expressed from pTub-Weg, which was prepared by inserting the Wg cDNA from 
pKS-Wg® into pTubulin. HA-tagged Dally was expressed from pAct-Dally-HA, 
prepared by inserting Dally-HA excised from pMT-Dally-HA (from S. Cohen) into 
pActin. To conveniently manipulate the coding sequence of Dlp, three nucleotides 
(GTC) were inserted at positions 2100-2102 (nucleotide numbering with the A of 
first codon at position 1) to introduce a Sall site in KS-Dlp. This was used to insert 
DNA encoding an HA tag flanked by Glycine (GYPYDVPDYAG) and thus gen- 
erate pKS-Dlp-HA. The Dlp-HA was then inserted into pTubulin to make pTub- 
Dip-HA. 

PIPLC treatment of imaginal discs and cultured cells. Wing imaginal discs were 
treated with PIPLC as previously described in™ with some modifications. In brief, 
discs were dissected from third instar larvae and incubated in Schneider’s medium 
with 10% FBS containing 10 U ml ! of PIPLC (Molecular Probes) at room temper- 
ature for 30 min. After treatment, the discs were washed three times with Schneider’s 
medium before extracellular staining (no detergent). S2 cells transfected with pTub- 
Dlp-HA and pActin (mock), or pTub-Dlp-HA and pAct-Notum-V5 as well as the 
corresponding conditioned medium were collected (total volume 300 yl) and 
treated with PIPLC (final concentration 1 U ml ') for 1.5h at 25 °C before phase 
separation. 

Phase separation assay. The phase separation assay was performed as previously 
described with some modifications. After PIPLC treatment, 200 pl of pre- 
condensed Triton X-114 (Sigma) was added to the reaction mixtures (Triton 
X-114 final concentration ~2%). The extracts were incubated for 15 min on ice 
and then centrifuged at 10,000g for 10 min at 4 °C. The supernatant were trans- 
ferred to new tubes and warmed at 37 °C ina water bath for 10 min. After a second 
centrifugation (10,000g for 10 min at room temperature), the upper phases (aque- 
ous) and lower phases (detergent) were collected separately and mixed with 4 
sample buffer (Life Technologies) for analysis by immunoblotting. 
Immunoblotting. Samples were run on 4—12% Bis-Tris NuPAGE gels (Invitro- 
gen) with MOPS buffer. Proteins on gel were transferred onto nitrocellulose mem- 
brane using iBlot gel transfer System (Invitrogen). The membranes were washed 
with dH,O and blocked with 5% skimmed milk in 0.1% Tween-20 PBS (PBS-T) for 
30 min at room temperature. Membranes were incubated with primary antibodies 
(mouse monoclonal anti-V5; Life Technologies, 1:5,000 and rat anti-HA; Roche, 
1:2,500) diluted in 5% milk PBS-T overnight at 4 °C and washed with PBS-T three 
times before incubation with HRP-conjugated secondary antibodies (anti-mouse 
or anti-rat; Biorad, 1:5,000). Membranes were washed again in PBS-T, developed 
using ECL prime western blotting detection system (GE Healthcare) and exposed 
to film. 

Glycan array. CM obtained from S2 cells expressing V5-tagged dNotum was over- 
laid on a focused neoglycolipid-based glycan array containing lipid-linked GAG 
oligosaccharide probes (see http://www 1 .imperial.ac.uk/glycosciences/ and refs 56, 57) 
and allowed to bind for 90 min. The array was then washed and stained with anti-V5 
mouse monoclonal antibody (Invitrogen) followed by biotinylated anti-mouse IgG 
(Sigma). Binding was detected with Alexa Fluor 647-labelled streptavidin. Fluores- 
cence intensity was quantified and data analysis was performed with dedicated micro- 
array software. No binding was observed when control medium was used instead 
of the conditioned medium or when the anti-V5 was used in the absence of the 
dNotum medium (data not shown). 

Large-scale expression of Notum constructs. The cDNA coding for mature 
hNOTUM (residues Arg 38-Ser 496) was cloned into the pHLsec vector™ that adds 
a C-terminal His6- or His10-tag. After the crystal structure was solved in crystal 
forms I and II (see below) and the folded region identified, a shorter construct 
hNOTUM ore comprising Ser 81-Thr 451, Cys330Ser, was found to provide higher 
expression levels, thanks in part to the removal of the non-conserved Cys 330, 
which provides a free, surface-exposed sulfhydryl. Expression of wild-type protein 
resulted in non-quantitative spontaneous crosslinking of the protein, a problem 
that was not observed with the Cys330Ser variant. 

For dNotum, we initially attempted to express Asp83-Thr617. However, a large 
unstructured and non-conserved domain of 22 kilodaltons (kDa) (Arg 416—Lys 597) 
was found to interfere with crystallization. This domain, which was not present in 
hNOTUM, was deleted and replaced by GNNNG to generate dNotumajoop. Note 
that this domain could provide an additional glypican-binding surface since it is 
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highly basic (pI = 12.4). Proteins were transiently expressed in HEK293T cells and 
purified as described“. Proteins for crystallization were expressed either in Gntl- 
deficient HEK293S cells or in HEK293 cells treated with kifunensine (1 mg’). 
Before crystallization the proteins were treated with endoglycosidase F1 at a ratio 
of 1:100. Proteins intended for kinetic studies were stored in 10 mM Tris-HCl, 
pH7.5, 1mM EDTA, 50 mM NaCl and 50% (v/v) glycerol at —20 °C. 

SPR equilibrium binding studies. Affinity between variants of hNOTUM and 
GPC3 or sulfated GAG was measured at 25 °C in 10 mM HEPES/NaOH, pH 7.5, 
150 mM NaCl and 0.005% Tween20 using a Biacore T200 machine (GE Health- 
care). GPC3 constructs (see below) or sulfated GAGs were coupled to a streptavidin- 
coated sensor chip via a biotin label and purified Notum proteins were used as 
analyte. Biotinylated GAGs were produced as described”. To produce biotinylated 
GPC3 we proceeded as follows. The cDNA encoding human GPC3 (full-length 
except for the lack of endogenous signal sequence, Pro 31—Asn 538) or GPC3,Gac 
(lacking a C-terminal stretch that normally contains the GAG attachment sites, 
Pro 31-Phe 493) was cloned into a variant of the pHLsec vector, which introduces 
a recognition sequence for the Escherichia coli BirA enzyme at the C terminus. 
Biotinylation at this sequence tag was performed by co-transfection of HEK293T 
cells with the GPC3 construct and an E. coli BirA expression construct. The syn- 
thetic BirA gene was codon-optimized and carried a C-terminal KDEL-tag for reten- 
tion in the endoplasmic reticulum. The BirA plasmid was used at 20% of total DNA. 
The expression medium was supplemented with 100 uM of sterile biotin prepared 
as a 2mM stock in PBS. After 3 days, the conditioned medium was cleared from 
cell debris and repeatedly buffer-exchanged to remove free biotin. The chip surface 
was precoupled with approximately 10,000 resonance units (RU) of streptavidin. 
Approximately 500 RU of GPC3 was immobilized. The amount of immobilized 
GAGs could not be measured. After each injection of analyte the chip surface was 
regenerated with 1 M NaCl, 10 mM HEPES/NaOH, pH/7.5, to return to baseline 
levels. Data were fit to a Langmuir adsorption model (B = BmaxC/(Ka + C), where 
Bis the amount of bound analyte and C is the concentration of analyte in the sam- 
ple. Data were then normalized to a maximum analyte binding value of 100. For the 
design of heparin binding site mutants, the following considerations were taken 
into account. If, based on the crystal structure, the hydrophobic part of the side 
chain (for example, Arg, Lys, His) was estimated to be of no structural importance, 
then the residue was mutated to serine. In all other cases it was mutated to glutamine 
(Arg, Lys) or asparagine (His) to keep the overall structure as native as possible. 
Heparin affinity chromatography. We compared the affinity of hNOTUM var- 
iants for heparin using a 1 ml HiTrap Heparin HP column (GE Healthcare). The 
column was equilibrated in 10 mM Tris-HCl, pH 8.0. Sample protein (120 jug), 
diluted into binding buffer, was injected onto the column. After washing of the 
column with five column volumes of binding buffer the protein was eluted in a linear 
gradient to 1.0 M NaCl over 20 column volumes. The flow rate was 2 ml min”. 
Chromogenic Notum activity assays. Steady-state carboxylesterase activity mea- 
surements of hNOTUM were performed in 50 mM MES/NaOH, 100 mM NaCl, 
pH6.5, using different chromogenic pNP esters (Sigma; number indicates carb- 
oxylic acid chain length). Substrate stocks in DMSO were adjusted to concentra- 
tions between 20 mM (pNP 16) and 2 M (pNP2) and diluted into reaction buffer. 
In tests using the short and soluble substrates pNP2 and pNP4, the final DMSO 
concentration was only 0.1%. In tests using longer pNP substrates or in compar- 
ative studies the final DMSO and Triton concentration was kept constant at 2.5% 
(v/v) and 0.5% (w/v) respectively. The required amount of a 20-mM substrate stock 
was first mixed 1:1 with a 20% (w/v) solution of Triton X-100 in reaction buffer. The 
resulting emulsion was then diluted with reaction buffer and vigorously agitated 
to avoid precipitation. Reactions were started by addition of 5-10 ul of protein at 
concentrations between 0.1 and 4 mg ml’. Substrate was measured using a Varian 
Cary 50 spectrophotometer by following the absorption change at 405 nm. The 
extinction coefficient of p-nitrophenol in reaction buffer was established to be 
4,070 M~' cm". Although Triton X-100 was required to maintain the solubility 
of long ester substrates and fatty acids, it was itself an inhibitor of Notum (Extended 
Data Fig. 8a, b). We assume that the hydrophobic region of Triton X-100 has a 
propensity to bind to the active site pocket. This notion is supported by the obser- 
vation that the much larger sterol-based detergent CHAPS evokes no inhibition. 
On the basis of this assumption of competitive inhibition by Triton X-100 we deter- 
mined its inhibition constant to be 466 UM and used it to calculate a corrected 
Michaelis constant for pNP8 turnover. 

Cell-based Notum activity assays. To assay Wingless signalling in Drosophila cells, 
a modified TOPFlash vector called WISIR, comprising a TCF-responsive promoter 
driving Firefly luciferase and a ubiquitous promoter driving Renilla luciferase was 
used. To assess the repressive activity of dNotum, S2R+ cells were transfected 
separately in 6-well plates with pTub-Wg (2 1g), pAct-Notum-V5 (2 jg) and 
WISIR (0.3 1g). The transfected cells were then cultured for 2 days at 25 °C and 
then mixed in equal ratio. Firefly and Renilla luciferase levels were measured 24h 
later with Dual-Glo luciferase reporter assay system. As controls, cells transfected 


with WISIR alone or with WISIR and pTub-W¢ were mixed with mocked treated 
cells. Firefly luciferase activity was normalized to Renilla luciferase activity and the 
average of triplicate samples was calculated. 

hNOTUM inhibition of Wnt signalling in mammalian cells was assessed in 
stably transfected SuperTopFlash (STF) HEK293 cells*®. These were treated with 
conditioned medium from Wnt3A-producing L cells* with or without recombin- 
ant purified hNOTUM. To reveal the direct action of hANOTUM on Wnt we 
proceeded as follows. Wnt3A CM was dialysed for 24h against ten volumes of 
tissue-culture-grade PBS and then sterile-filtered with the aim to remove chelators 
that might interfere with TALON-binding (see below). To 500 il of such dialysed 
Wnt3A, 5 pl hNOTUM protein or an unrelated control protein (mock) at a con- 
centration of 1 mg ml” ' was added and incubated for the indicated time at room 
temperature (23 °C). To stop the enzymatic reaction we added 50 ul of fresh 50% 
slurry of cobalt affinity beads (TALON resin) equilibrated against tissue culture 
grade PBS and 5 1] of 500 mM imidazole in PBS. After vigorous shaking the solution 
was incubated for 1 h at room temperature on a vertical rotator. Beads containing 
the His, 9-tagged hNOTUM protein were removed by centrifugation (3,000g, 5 min) 
and discarded. The supernatant was cleared again by centrifugation at maximum 
speed (16,000g, 5 min). 100 jl of the reaction solution was then added to STF cells 
seeded the previous day at 50,000 cells per 100 il and per well into 96-well plates. 
The Wnt-induced luciferase activity was measured after 16-20 h using the Glo kit 
(Promega) and an Ascent Luminoskan luminometer (Labsystems) following the 
instructions of the manufacturer. Data represent average of quadruplicate mea- 
surements + s.d. The incubation time with cells was kept constant for all compared 
samples. 

To assess hNOTUM inhibition of Norrin signalling, STF cells seeded in 96-well 
plates were transfected after 24h with 200 ng DNA: 60 ng each of hFZ4 and hLRP6 
plasmids, 30 ng of Tspan-12 plasmid, and 50 ng constitutive Renilla luciferase plas- 
mid (pRL-TK from Promega). Cells were stimulated 24h after transfection with 
10 1g ml! recombinant Norrin (T.-H. Chang et al. manuscript in preparation), 
which had been preincubated for 24h with 10 ig ml” ' hNOTUM variants or FCS 
as a control. Firefly and Renilla luciferase activities were measured 48 h later with 
Dual-Glo luciferase reporter assay system (Promega). Firefly luciferase activity was 
normalized to Renilla luciferase activity and the average of triplicate samples was 
calculated. 

Crystallization, data collection and structure determination. Concentrated pro- 
teins were subjected to sitting drop vapour diffusion crystallization trials employ- 
ing a Cartesian Technologies pipetting robot and typically consisted of 100-300 nl 
of protein solution and 100 nl of reservoir solution. A detailed discussion of the 
multiple conditions in which crystal growth occurred is provided in Supplemen- 
tary Information. Standard methods were used for X-ray diffraction data collection 
and structure determination, distinctive details for the series of crystal structures 
are discussed in Supplementary Information. 

Mass spectrometric analysis of the effect of Notum on Wnt3A protein. As a 
general method to quantify the levels of delipidated Wnt3A protein by LC-MS- 
MS, we used an isotope coded alkylation reaction targeting cysteines to multiplex 
mass spectrometry signal. One sample was reacted with heavy iodoacetamide (IAA, 
3C,D,HINO) and the other with the light version (C,H3INO) and consequently, 
peptide signal doublets appeared at A4D per cysteine with peak areas used for 
relative quantification. Wnt3A protein (500 ng, purified from L cell conditioned 
medium by K. Dingwell (NIMR) according to ref. 4) was mixed with purified 
hNOTUM (1 pl of purified hANOTUM ore at 25 ng ul?) in the following buffer: 
20 mM Tris-HCl buffer (pH 7.5) containing 500 mM NaCl, 0.5 mM EDTA, 0.5% 
CHAPS and 5% glycerol and left together for 16 hat 25 °C. The reaction was quenched 
by addition of 4x LDS sample buffer (Life Technologies). Coomassie blue stained 
bands from SDS-PAGE were excised from the gel and cut in half and destained 
by incubating for 45 mins with 200mM ammonium bicarbonate (ABC)/60% 
acetonitrile (ACN). To reduce cysteines, buffer was refreshed with the inclusion 
of 10mM dithiothreitol (DTT) for 15 min. After washing, half of the gel pieces 
were incubated in 20 mM heavy or light IAA in 100 mM ABC/60% ACN buffer in 
the dark for 30 min. Proteins were digested using a 4 h in-gel trypsin digestion step 
in 100mM ABC and then quenched with 0.1% TFA. Equal aliquots of heavy and 
light reaction were mixed to generate forward and reverse labelled samples. Dupli- 
cate LC-MS analysis was performed using an Ultimate3000 RSLC system coupled 
to a LTQ-Orbitrap Velos-Pro mass spectrometer (Thermo Scientific). The instru- 
ment was operated in an alternating targeted MS/MS and data dependent acquisi- 
tion mode. CHGLSGSCEVK and the control peptide AGIQECQHQFR were 
targeted for MS/MS. MS/MS spectra were searched using Mascot v2.3 and iden- 
tifications imported as a spectral library into Skyline software v2.6.0.6709. Skyline 
was used for peaks extraction and areas determination 

Mass spectrometric analysis of Wnt3A and Shh peptides. Delipidation assays 
were performed by reacting 3 ug of synthetic peptides (synthesis described in 
Supplementary Information) with 1 pil of enzyme (hNOTUM ore or hNOTUM core 
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(S232A); 25 ng pl ’) in 20 mM ammonium bicarbonate buffer (total volume 5 ull) 
for 16h at 25°C. The reaction was quenched with 0.1% TFA and samples were 
desalted using c18 zip tips. Samples were prepared in o.-cyano-4-hydroxycinnamic 
acid in 50:50 water/acetonitrile with 0.1% TFA. MALDI-TOF spectra were acquired 
using an ABSCIEX 5800 TOF/TOF systems and analysed using data explorer v4.11. 
Statistics. No statistical methods were used to predetermine sample size. 
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Extended Data Figure 1 | Notum modulates Wingless, but not Dpp or d-g, Strong, but not complete, reduction of notum activity led to ectopic 


Hedgehog signalling. a, b, Overexpression of dNotum-V5 with the apterous- _ wing margin bristles (compare insets in d and e) but had no significant effect on 
Gal4 driver, which is expressed in the dorsal compartment, prevents expression _ wing area, which is sensitive to Dpp signalling (f) (P = 0.26, Student's t-test), 


of Senseless (Sens) (b, middle), a Wingless target gene, but has little effect or on the distance between L3 and L4 veins, which is affected by changes in 
on phospho-Mad (pMad) immunoreactivity (b), an indicator of Dpp Hedgehog signalling®’ (g) (P = 0.41, Student’s t-test). In total, 19 control 
signalling. c, Loss of notum activity, achieved by generating large patches of (notum'”*) and 17 mutant (notum'”*°) wings were analysed. Error bars 


notum®® tissue (see Methods), marked by the loss of GFP, leads to broadening __in f and g are s.d. 
of Senseless expression but does not affect pMad immunoreactivity. 
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Extended Data Figure 2 | dNotum does not cleave the GPI anchor of 
glypicans. a, b, Ectopic expression of Senseless caused by NRT-wingless, as well 
as endogenous Senseless, is suppressed by co-expression of dNotum. NRT- 
wingless and notum are expressed in a vertical band under the control of dpp- 
Gal4. c, Western blot analysis of phase-separated extracts of S2 cells transfected 
with a plasmid expressing HA-tagged Dally. In control extracts, Dally is found 
largely in the detergent (D) phase. Coexpression of dNotum-V5 from a 
plasmid had no effect, while treatment with PIPLC shifted all detectable Dally 
to the aqueous (A) phase. d, dNotum-V5 expression as in c was sufficient to 
suppress Wingless-induced TOPFlash activity. Cells were transfected with a 


- Wg Wg+Notum 


PIPLC treated 


ap > notum-V5 


PIPLC treated 


ap > notum-V5 


dual luciferase TOPFlash reporter® along with a mock plasmid (—), 
tubulin::wingless (Wg), or tubulin::wingless + actin::notum-V5 (Wg + 
Notum). e-h, Extracellular Dlp in control (e, g), PIPLC-treated (f) or apterous- 
Gal4 UAS-notum-V5 (h) imaginal discs. i-], Extracellular anti-GFP staining of 
imaginal discs from gene trap line expressing Dally-GFP fusion protein. 
Discs were treated with a mock solution (i) or PIPLC (j) (same discs as in e or 
f, respectively, but here showing Dally protein). In a separate experiment, 
dNotum was overexpressed with apterous-Gal4 in the Dally-GFP background 
(1). No change in the distribution of extracellular GFP could be seen compared 
to that in control discs (k, no apterous-Gal4). 
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Extended Data Figure 3 | dNotum requires Dally to inhibit Wingless 
signalling. a, Wingless and Senseless expression in a dally ’~ wing imaginal 
disc expressing NRT-wingless and notum under the control of dpp-Gal4. 
Some senseless expression remains, indicating that, in the absence of Dally, 


Control 


sal > notum-V5 


dNotum is a poor inhibitor of NRT-Wingless-induced (as well as endogenous) 
signalling. b-d, Anterior margin of wings from control, spalt (sal)-Gal4 
UAS-notum-V5, and sal-Gal4 UAS-notum-V5 dally ’~ animals. Removal of 
dally rescues the loss of margin bristles caused by dNotum overexpression. 
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Extended Data Figure 4 | dNotum binds to sulfated glycans. Binding of dNotum-V5 to a GAG oligosaccharide array, detected by immunofluorescence. CSA/B/ 
C, chondroitin sulfate A/B/C; HA, hyaluronic acid, hep, heparin; HS, heparan sulfate. Details on the array are provided in the Methods. 
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Extended Data Figure 5 | Additional structural information on Notum. hNOTUM structures (from crystal forms III and V). Crystal form III is the 
a, Topology plot of hNOTUM. B-strands are shown as numbered trianglesand most structurally different. All other structures superimpose with root mean 
a-helices as circles labelled in alphabetical order from the N to C terminus squared deviation (r.m.s.d.) of <0.7 A. Circles highlight the most flexible 
(NT to CT). Structural elements conserved among most «/$-hydrolases are regions. c, Comparison between the structures of hNOTUM (form V) and 
outlined in grey. b, Comparison of the two most conformationally distinct dNotum (form I). The circle highlights the lack of a cysteine bridge in dNotum. 
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Extended Data Figure 6 | Structural and biophysical analysis of heparin 
binding. a, Heparin affinity chromatography of wild-type hNOTUM and 
selected surface variants. b-e, Close-up views of additional sulfate binding sites 
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Extended Data Figure 7 | Relation of Notum to other esterases of the a/f 
hydrolase family. a, Comparison between hNOTUM and human esterase 

D (hESTD), showing structural relatedness. hNOTUM is also related to 
hAPT1, a cytosolic esterase used in this study as a positive control for fatty acid 
esterase activity. In the views shown here, the hNOTUM structure has been 
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rotated by 90° around the x axis relative to the structure shown in Fig. 3b. 

b, Rootless phylogenetic tree of animal Notum proteins (red) and plant 
pectin acetylesterases (PAE, green). Extent of sequence identity to hNOTUM is 
shown next to species name. Percentages between branches indicate sequence 
identity between neighbours. 
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Extended Data Figure 8 | Substrates and inhibitors of ANOTUM. 

a, Inhibition of hANOTUM activity on pNP-butyrate (pNP4) by PMSF (30 min 
pre-incubation with 2mM PMSF) as well as by Triton X-100 and CHAPS 
(0.5%). Presence of 20 mM SOS and 50 mg 1”? heparin results in a minor 
increase of esterase activity. The height of each bar represents activity relative to 
the mean of four control samples lacking the additives. b, Saturable inhibition 
of hNOTUM by Triton X-100. Triton X-100 inhibits many esterases owing 
to binding to the acyl binding pocket through its hydrophobic group. c, Lack of 
inhibition of Norrin-mediated B-catenin stabilization by Notum. Recombinant 


Norrin was pretreated with hNOTUM,o;. at a concentration sufficient to 
suppress Wnt3A-mediated signalling. d, e, Saturation kinetics of the action of 
hNOTUM on pNP-octanoate (pNP8, d) and pNP-butyrate (pNP4, e). 

The activity was normalized to the Ajax calculated for hNOTUM ore. The 
activity values for the larger, full length protein were adjusted to compensate 
for the increased mass. Apparent K,, values in d were corrected for the 
inhibition caused by Triton X-100. f, Saturation inhibition kinetics with 
myristoleic and palmitoleic acid. pNP8 was used at a concentration of 1 mM 
and 250 uM, respectively. 
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Extended Data Figure 9 | Additional mass spectrometric analysis of the 
hNOTUM deacylase activity. a, Mass spectra of CHGLSGSCEVK from 
trypsinized Wnt3A protein mock-treated or treated with hNOTUM core. 


Left-hand graph is the same 


as that shown in Fig. 5a, while the right-hand 


side shows the results of a separate experiment performed with the labels 
reversed. b, Duplicate LC-MS peak areas with label reversal. Irrespective of the 
nature of the label (grey indicates light label, black indicates heavy label), 
hNOTUM ore triggered an increase in peak area of the delipidated Wnt3A 
tryptic peptide. c, d, Two control Wnt3A cysteine-containing peptides from the 
same data set were not affected by hNOTUM ore. e€, Activity of hNOTUM core 


and its Ser232Ala variant on 


a synthetic disulphide-bonded Wnt3A peptide 


(CHGLSGSCEVK) palmitoleoylated on the first serine. Both lipidated and 


unlipidated peptide could be 


detected by MALDI-TOF. Incubation with 


hNOTUM. ore; but not its Ser232Ala variant, caused significant delipidation 
(peak corresponding to delipidated peptide is marked by asterisk). 
Quantification of triplicate experiments is shown in Fig. 5c. f, MALDI-TOF 
analysis shows that neither HANOTUM ore nor its Ser232Ala variant delipidated 
a synthetic SHH peptide (CGPGRGFGKRR) palmitoylated on its N-terminal 
cysteine. Quantification of triplicate experiments is shown in Fig. 5d 

(peak corresponding to lipidated peptide is marked by black triangle). g, Two- 
dimensional active site schematic relating to Fig. 5e. Additional hydrogen 
bonds and electron pair movements thought to occur during hydrolysis by 
the wild type protein are shown in grey. h, Close-up view on the myristoleate 
active site complex of hNOTUM ore (crystal form I). The experimental omit 
electron density is contoured at 20. 
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Integrase-mediated spacer acquisition 
during CRISPR-Cas adaptive immunity 
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Bacteria and archaea insert spacer sequences acquired from foreign DNAs into CRISPR loci to generate immunological 
memory. The Escherichia coli Cas1-Cas2 complex mediates spacer acquisition in vivo, but the molecular mechanism of 
this process is unknown. Here we show that the purified Casl-Cas2 complex integrates oligonucleotide DNA substrates 
into acceptor DNA to yield products similar to those generated by retroviral integrases and transposases. Cas] is the 
catalytic subunit and Cas2 substantially increases integration activity. Protospacer DNA with free 3’-OH ends and supercoiled 
target DNA are required, and integration occurs preferentially at the ends of CRISPR repeats and at sequences adjacent to 
cruciform structures abutting AT-rich regions, similar to the CRISPR leader sequence. Our results demonstrate the Cas1- 
Cas2 complex to be the minimal machinery that catalyses spacer DNA acquisition and explain the significance of CRISPR 
repeats in providing sequence and structural specificity for Casl1-Cas2-mediated adaptive immunity. 


Prokaryotic adaptive immunity relies on clustered regularly interspaced 
short palindromic repeats (CRISPRs) together with CRISPR associated 
(Cas) proteins to detect and destroy foreign nucleic acids’. CRISPR loci 
contain an AT-rich leader sequence followed by repetitive sequence ele- 
ments flanking ~30 base pair (bp) spacer segments that are transcribed 
to produce precursor CRISPR RNAs (pre-crRNAs)*°. Spacers are fre- 
quently virus- or plasmid-derived, although ‘self -derived spacers from 
the host chromosome are present in some CRISPR loci®. After pre-crRNA 
processing and assembly with Cas proteins, the resulting surveillance 
complexes target and cleave foreign nucleic acids bearing sequences 
complementary to the crRNA spacer sequence’’*. How spacer DNA 
sequences, termed protospacers, are acquired into the host CRISPR locus 
remains unknown. 

Overexpression of Cas] and Cas2 nucleases, the only Cas proteins 
found in all CRISPR-Cas systems, leads to the site-specific acquisition 
of 33 bp protospacers at the leader end of the CRISPR locus in E. coli’. 
Furthermore, Cas1 and Cas2 function as a complex in vivo"*, suggest- 
ing that the Casl1-Cas2 complex might possess DNA recombination 
activity. We reconstituted CRISPR spacer acquisition using purified Cas1 
and Cas2 proteins, protospacers and acceptor plasmid DNA, revealing 
an elegant mechanism in which both the sequence and structural ele- 
ments of the CRISPR repeats specify spacer integration sites. 


Protospacer DNA integration by Cas1-Cas2 


To test whether the Cas1-Cas2 complex is sufficient to catalyse DNA 
recombination in vitro, assays were conducted using purified Cas1- 
Cas2 complex, 33 bp protospacer DNA and an acceptor ‘target’ plas- 
mid consisting of the pUC19 backbone with an inserted CRISPR locus 
(pCRISPR) (Fig. 1a). Co-incubation of these reagents converted the 
supercoiled plasmid into three main products: relaxed and linear plas- 
mid species and a fast-migrating species we term band X (Fig. 1b, c 
and Extended Data Fig. 1a). Product formation required Cas1, Cas2 and 
the protospacer DNA (Extended Data Fig. 1b-d), and was consistent 
with previous divalent metal ion-dependent and sequence-nonspecific 
in vitro activity requirements of Cas] (refs 17-19) and Cas2 (refs 20-22). 
Product DNA migration was not affected by treatment with EDTA, 


EDTA and phenol-chloroform extraction or proteinase K in the pres- 
ence of EDTA and detergent (Extended Data Fig. le), indicating that 
product DNAs are unlikely to be bound to Cas] and/or Cas2. Consis- 
tent with product DNA resulting from covalent integration of proto- 
spacer DNA into the plasmid, the relaxed and linear forms of pCRISPR 
became radiolabelled in reactions containing *P-labelled protospacer 
DNA (Fig. 1d and Extended Data Fig. 2). Although Cas] alone catalysed 
alow level of protospacer integration in the presence of Mn7~, the reac- 
tion was enhanced substantially by the presence of Cas2 (Extended Data 
Fig. 2b). 

Bacteria expressing Cas] active-site mutants, but not Cas2 active-site 
mutants, are incapable of acquiring new spacers in vivo, demonstrating 
the catalytic role of Cas1 during spacer acquisition'*'*"°. Consistent with 
these data, Cas1 active site mutants H208A and D221A were defective 
for protospacer integration in vitro, whereas the Cas2(E9Q) active-site 
mutant supported integration (Fig. 1c, eand Extended Data Fig. 3). The 
Cas2 C-terminal (AB6-B7) deletion mutant, which is defective for com- 
plex formation with Cas1 and spacer acquisition in vivo, failed to sup- 
port Cas1-mediated integrase activity (Fig. 1c, e). We conclude that our 
in vitro assay recapitulates the in vivo functions of Cas] and Cas2 during 
spacer acquisition. 


Integration and disintegration products 


We tested whether the reaction products of Cas1-Cas2-mediated DNA 
integration resemble those formed by the strand transfer activity of re- 
troviral integrases and cut-and-paste transposases***°. These enzymes 
generate two main products in vitro corresponding to half-site and full- 
site integration events (Fig. 2a). We observed similar gel mobility of the 
slowly migrating DNA product generated by Cas1-Cas2 and Nb.BbvCI 
nickase-digested pCRISPR, consistent with the slow-migrating relaxed 
DNA species corresponding to half-site products and/or products result- 
ing from full-site integration of one protospacer molecule (Extended 
Data Fig. 1a). Digestion with EcoRI, which cuts pCRISPR once, con- 
verted the reaction products to linear DNAs (Fig. 2b, compare lane 4 to 
lane 2, and Fig. 2c). We therefore conclude that both the relaxed and 
band X DNA products comprise unit-sized pCRISPR circles. 
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Medical Institute, University of California, Berkeley, Berkeley, California 94720, USA. °Department of Chemistry, University of California, Berkeley, Berkeley, California 94720, USA. Physical Biosciences 


Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA. 
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We observed that band X did not become radiolabelled in reactions 
conducted with *’P-labelled protospacer DNA. A time-course analysis 
revealed relaxed DNA product formation within the first minute, fol- 
lowed by accumulation of band X between 10 and 30 min (Fig. 2d). To 
determine the properties of band X, the purified product was analysed 
in two different types of agarose gels—one pre-stained with ethidium 
bromide, similar to the gels presented thus far, and the other stained 
with ethidium bromide after electrophoresis (post-stained) (Extended 
Data Fig. 4a). Although band X migrated as a single species in the pre- 
stained gel, a ladder of species that migrated faster than the relaxed pro- 
ducts was observed in the post-stained gel (Fig. 2e, f). These intermediates 
are reminiscent of plasmid topoisomers*””*. The same pre- and post- 
stained agarose gel analysis was performed on the entire integration 
reaction, generating similar results to those observed with purified band 
X (Extended Data Fig. 4b, c). PCR analysis of various segments of 
pCRISPR using gel-purified band X as the template yielded amplifica- 
tion products indistinguishable from those generated using unreacted 
supercoiled pCRISPR or relaxed integration products, supporting the 
conclusion that band X corresponds to pCRISPR topoisomers (Extended 
Data Fig. 4d). 

We wondered whether Band X arose from protospacer excision 
from half-site integration products to regenerate pCRISPR in different 
supercoiled states, analogous to the in vitro reversal of integration activity 
of retroviral integrases and transposases (termed disintegration, Fig. 2g)?”°. 
To test this hypothesis, a synthetic Y-structured DNA intermediate that 
mimics the half-site integration product (Extended Data Fig. 5a, b) was 
radiolabelled such that the liberated 33 bp protospacer DNA could 
be detected following disintegration activity. Using this substrate, we 
observed that Cas1 catalysed disintegration activity either by itself or in 
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the presence of Cas2 (Fig. 2h). Disintegration activity was confirmed by 
radiolabelling the 20 nucleotide (nt) target DNA strand and monitor- 
ing the formation of the joined 40 bp target DNA product (Extended 
Data Fig. 5c, d). Thus, Cas1-Cas2 integration and disintegration activ- 
ities are similar to those of retroviral integrases and transposases. 


Integration requires 3’-OH protospacer ends 


We next investigated the DNA protospacer and target DNA requirements 
for integration. Single-stranded protospacer DNA failed to support the 
reaction (Fig. 3a, b). The Casl1-Cas2 complex accommodated various 
protospacer lengths in vitro despite the strict 33 bp requirement for spacer 
acquisition in vivo (Extended Data Fig. 6a), suggesting that protospacer 
length is pre-determined before integration in vivo by an unknown 
mechanism. The Cas1-Cas2 complex integrated DNA substrates with 
blunt-ends or with 3’-overhangs up to 5 nt in length (Extended Data 
Fig. 6b). In contrast to retroviral integrases”', substrates with 5’-overhangs 
were non-viable (Extended Data Fig. 6b). 

Retroviral integration and transposition reactions proceed via nu- 
cleophilic attack of DNA 3'-OH groups at target DNA phosphodie- 
ster bonds*'**. We found that phosphorylation of both 3’-ends of the 
protospacer ablated integration, whereas phosphorylation of only one 
3’ end strongly limited integration (Fig. 3a, b). By analogy to known 
integrase enzyme mechanisms, DNA integration could proceed by Cas] - 
catalysed direct nucleophilic attack of the substrate 3’-OH on the target 
DNA, or by formation of a Cas1-DNA intermediate, as occurs in the 
serine and tyrosine families of recombinases”’. Four tyrosine residues 
in the vicinity of the Cas1 active site’”"'? could be involved in forming 
such a covalent intermediate (Extended Data Fig. 7a, b). Purified Cas1 
mutant proteins in which each tyrosine was individually changed to 
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alanine supported protospacer integration in vitro at levels comparable 
to wild-type Cas1-Cas2 (Extended Data Fig. 7c). Thus, the integration 
reaction likely proceeds via direct nucleophilic attack of protospacer 

'-OH ends onto the target DNA phosphodiester bonds, a mechanism 
previously hypothesized to occur in vivo™*. 


Supercoiled DNA and CRISPR locus requirements 


Cas1 and Cas2 overexpression leads to site-selective spacer acquisition 
proximal to the leader end of the CRISPR locus, a result consistent with 
observations in native populations of CRISPR-containing bacteria’**>”°. 
To determine what drives such site-specific integration, we first tested 
various forms of the pCRISPR plasmid to determine target DNA re- 
quirements. Integration requires target DNA supercoiling, as neither 
relaxed nor linear pCRISPR, nor the isolated 1 kb CRISPR locus, sup- 
ported integration (Fig. 3c and Extended Data Fig. 6c, d). 

As a control, we tested supercoiled pUC19 DNA, the parental plas- 
mid of pCRISPR that lacks a CRISPR locus, and were surprised to 
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Figure 3 | Integration requires 3’-OH protospacer ends and supercoiled 
target DNA. a, b, Integration assays using single-stranded DNAs and 
either -OH or —PO, at the 3’ or 5’ ends of unlabelled (a) or radiolabelled 
(b) protospacers. S1 corresponds to one strand of the protospacer and S2 
corresponds to the complementary strand. c, Comparison of protospacer 
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Figure 2 | Half-site, full-site integration and 
pCRISPR topoisomer products. a, Schematic of 
half-site and full-site integration products. 

b, Linearization of the integration products (lane 
4). Lane 3 is the untreated reaction products. 

c, Linearization of integration products from 
radiolabelled protospacer reactions. d, The time 
course reveals the initial formation of relaxed 
products, followed by band X. The inset reveals 
the products detected using *?P-labelled 
protospacers. e, f, Analysis of gel-purified relaxed 
and band X on agarose gels pre-stained with 
ethidium bromide (e) or post-stained after 
electrophoresis (f). g, Schematic of the 
disintegration reaction. h, Native polyacrylamide 
gel analysis of the disintegration reaction. The 
data presented in b-f, h are representative of at 
least three replicates. 
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observe integration products upon incubation with Cas1 and Cas2 in 
the presence of protospacer DNA (Fig. 3c and Extended Data Fig. 6e). 
This finding raised two possibilities: either in vitro spacer integration 
is non-specific with respect to target DNA sequence or structures and/ 
or sequence(s) favouring integration are present in the pUC19 plas- 
mid. To determine if integration preferentially occurred at the CRISPR 
locus of pCRISPR, products of radiolabelled reactions were double- 
digested to separate the CRISPR locus (960 bp) from the pUC19 plas- 
mid backbone (~2.27 kb). Suggestive of CRISPR-specific integration, the 
*p_radiolabel migrated solely with the CRISPR locus fragment (Fig. 3d). 
The same result was observed when the experiment was conducted using 
a target plasmid containing the CRISPR locus and a different backbone 
sequence (pACYC) (Fig. 3e). 


CRISPR repeats provide specificity 


To determine the exact sites of protospacer integration in these reac- 
tions, we performed high-throughput sequencing of reaction products 
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integration into different DNA targets. d, e, Restriction enzyme digestion 

of pCRISPR, either in a pUC19 (d) or pACYC backbone (e), after the 
integration assay detects integration into the CRISPR fragment (green arrows). 
The data presented in a-e are representative of at least three replicates. 
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Figure 4 | Protospacers are specifically integrated into the CRISPR locus. 
a, Integration sites along pCRISPR. b, Magnified view of the integration sites 
along the ~1 kb CRISPR locus. The cyan peaks represent positions where 
the 3’ T of the protospacer DNA was integrated whereas the black peaks 
represent the C 3'-OH integration events. The protospacer sequence is depicted 


that resulted from using either pCRISPR or the parental pUC19 vector 
as the target of integration (Extended Data Fig. 8a). Of the 7,866 pro- 
tospacer-pCRISPR junctions retrieved, ~71% mapped to the CRISPR 
locus (Fig. 4a and Extended Data Fig. 8b). Protospacer insertion occurred 
at the borders of each repeat, with the most preferred site at the first 
repeat adjacent to the leader (Fig. 4b). The minus strand of each repeat 
(the bottom strand in Fig. 4a, b that runs 5’ to 3’ towards the leader 
sequence) is also highly preferred, highlighting the role of CRISPR 
repeats in providing sequence specificity for the Cas1-Cas2 complex 
(Fig. 4b). Sequence alignment of the integration sites revealed strong 
preference for sequences resembling the CRISPR repeat on both strands 
of pCRISPR, further supporting the selection of CRISPR repeat bor- 
ders by the Cas1-Cas2 complex (Extended Data Fig. 8d-f). 

The most frequent integration site in the pUC19 control plasmid 
mapped to the amp resistance gene adjacent to the AT-rich promoter 
sequence (~8.8% of 5,524 total retrieved junctions, Fig. 4c and Ex- 
tended Data Fig. 8c). An inverted repeat sequence with a propensity to 
form a DNA cruciform” occurs 9 nt adjacent to this integration site 
(plus strand sequence: 5’-TTCAATATTATTGAA-3’), suggesting that 
potential DNA cruciform formation adjacent to AT-rich sequences is 
important for protospacer integration. Sequence analysis of pUC19 
target sites revealed the propensity for a G nucleotide to occur at the 
—2 and +1 positions of the protospacer insertion site, similar to the 
preferred pCRISPR sites (Extended Data Fig. 8g, h). These observa- 
tions imply that in addition to sequence, pCRISPR repeat selectivity 
stems from the unique structural features of these sites, such as their 
ability to form cruciforms (Fig. 4a, b, e). 

In E. coli, newly acquired spacers harbour a 5’ G as the first nucle- 
otide flanking the leader-proximal end of the repeats, which originates 
from the last nucleotide of the AAG protospacer-adjacent motif 
(PAM) from foreign DNA'*>?”~*”. Such positional specificity is crit- 
ical for crcRNA-guided interference, as a mutation in this position of the 
corresponding crRNA disrupts PAM binding and subsequent target 
destruction’. We found that ~73% of all integration events into 
pCRISPR used the 3’ C end instead of the 3’ T end of protospacer DNA 
during integration (see Fig. 4b for protospacer sequence), and there was 
a strong preference for this nucleotide to attack the minus strand of the 
repeat sequence (Fig. 4b, d, e). A similar nucleotide bias was observed 
in the pUC19 target plasmid sequence data (Fig. 4d). This preference 
positions the G at the 5’ end of the protospacer substrate as the first 
nucleotide of the newly integrated spacer in the CRISPR locus (Fig. 5). 
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above the plot. c, Integration sites along pUC19. d, Comparison of C 3’-OH 
or T 3’-OH selection in the total reads from pCRISPR and pUC19 targets 

(n = 7,866 reads for pCRISPR and n = 5,524 reads for pUC19, chi-square test, 
*P < 0.0001). e, Schematic of DNA cruciform formation of the repeat 
sequences. The orange arrows depict the cleavage sites. 


When we used protospacer DNAs lacking a 3’ C or bearing 3’ C on 
both ends, the preference for integration into the minus strand of the 
CRISPR locus was significantly decreased (Extended Data Fig. 9). Thus, 
the Cas1-Cas2 complex plays a critical role in correctly orienting the C 
3'-OH end of protospacer DNA substrates for incorporation within the 
CRISPR locus. 


Mechanism of protospacer integration 

The results presented here explain the mechanistic basis for foreign DNA 
acquisition during CRISPR-Cas adaptive immunity (Fig. 5). The Cas1- 
Cas2 complex catalyses integration of protospacers at the leader-end of 
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the CRISPR locus and also selects the terminal C 3'-OH as the attacking 
nucleophile, resulting in the 5’ G on the opposite strand of the proto- 
spacer becoming the first nucleotide of the newly integrated spacer. This 
orientation bias, previously observed in vivo”, is a key step during 
immunity for productive downstream foreign DNA targeting by the 
Cascade complex and Cas3 effector nuclease (Extended Data Fig. 10). 
Interestingly, the presence of the complete AAG PAM in the proto- 
spacer is not required for in vitro integration, suggesting that a highly 
specific selection or processing step occurs in vivo to exclude the AA 
nucleotides from the mature protospacer before integration. 

We propose a two-step integration mechanism in which the C 3’- 
OH first attacks the minus strand of the CRISPR repeat to produce a 
half-site intermediate (Fig. 5). The 3’-OH on the opposite strand of the 
integrating DNA then attacks the target DNA 28 bp away on the oppo- 
site side of the repeat on the plus strand, leading to full integration of 
the protospacer (Fig. 5). Our in vitro system predominantly traps the 
first step of this two-step integration mechanism, suggesting that the 
second nucleophilic attack is greatly accelerated in vivo in the presence 
of cellular factors. This model is consistent with spacer integration 
intermediates that are observed in vivo, in which protospacers are inte- 
grated such that staggered cleavage at each end of the repeat generates 
single-stranded gaps that ensure repeat duplication**. The in vivo con- 
ditions could also promote the high specificity of integration to occur 
solely downstream of the first repeat of the CRISPR locus in E. coli, 
instead of at every repeat, as observed in our in vitro assay. 

CRISPR spacer integration shares mechanistic similarities with ret- 
roviral integration and DNA transposition, where the integrase/trans- 
posase enzyme uses donor DNA 3’-OH ends to make a staggered cut 
at the DNA target site, which concurrently joins the donor DNA to tar- 
get DNA 5’-phosphates*’””. Completion of the integration reaction 
requires a DNA polymerase to fill in sequence gaps and a DNA ligase 
to seal the phosphodiester backbone”. Similar polymerase and ligase 
functions are required to complete CRISPR spacer acquisition in vivo, 
although the specific enzymes involved have not yet been identified. 
Despite these similarities, we note that the Cas1 active site does not 
harbour the RNase H fold that defines the retroviral integrase enzyme 
superfamily**. This structural difference could explain the unexpected 
production of different topoisomers of pCRISPR (band X) in vitro, although 
the physiological significance of band X production remains unclear. 

Our results highlight the fundamental role of repeat sequences at 
multiple stages of CRISPR-Cas adaptive immunity. In addition to cre- 
ating structures within nascent CRISPR transcripts that ensure correct 
RNA processing during crRNA maturation”, the repeats operate at the 
DNA level to recruit the Cas1-Cas2 complex for sequence- and structure- 
specific protospacer integration. We envision that this recruitment 
involves transient DNA cruciform formation within the CRISPR inverted 
repeats that occurs as a function of target DNA supercoiling**. The ob- 
servation that a preferred non-CRISPR site of Casl1-Cas2-mediated 
DNA integration is proximal to an inverted repeat adjacent to an AT- 
rich sequence suggests the fascinating possibility that CRISPR loci arise 
in naive genomes through integration events that become self-propagating 
through creation of repetitive sequences with properties that ensure con- 
tinual recognition and activity by the Cas1—Cas2 integration machinery. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 


Received 6 November 2014; accepted 15 January 2015. 
Published online 18 February 2015. 


1. Barrangou, R. et al. CRISPR provides acquired resistance against viruses in 
prokaryotes. Science 315, 1709-1712 (2007). 

2. van der Oost, J., Westra, E. R., Jackson, R. N. & Wiedenheft, B. Unravelling the 
structural and mechanistic basis of CRISPR-Cas systems. Nature Rev. Microbiol. 
12, 479-492 (2014). 

3. Mojica, F. J., Diez-Villasenor, C., Garcia-Martinez, J. & Soria, E. Intervening 
sequences of regularly spaced prokaryotic repeats derive from foreign genetic 
elements. J. Mol. Evol. 60, 174-182 (2005). 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27: 


28. 


29. 


ARTICLE 


Bolotin, A., Quinquis, B., Sorokin, A. & Ehrlich, S. D. Clustered regularly interspaced 
short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. 
Microbiology 151, 2551-2561 (2005). 

Pourcel, C., Salvignol, G. & Vergnaud, G. CRISPR elements in Yersinia pestis acquire 
new repeats by preferential uptake of bacteriophage DNA, and provide additional 
tools for evolutionary studies. Microbiology 151, 653-663 (2005). 

Stern,A., Keren, L., Wurtzel, O., Amitai, G. & Sorek, R. Self-targeting by CRISPR: gene 
regulation or autoimmunity? Trends in Genet. 26, 335-340 (2010). 

Carte, J., Wang, R., Li, H., Terns, R. M. & Terns, M. P. Cas6 is an endoribonuclease 
that generates guide RNAs for invader defense in prokaryotes. Genes Dev. 22, 
3489-3496 (2008). 

Haurwitz, R. E., Jinek, M., Wiedenheft, B., Zhou, K. & Doudna, J. A. Sequence- and 
structure-specific RNA processing by a CRISPR endonuclease. Science 329, 
1355-1358 (2010). 

Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded small RNA and host 
factor RNase Ill. Nature 471, 602-607 (2011). 


. Brouns, S. J. et al. Small CRISPR RNAs guide antiviral defense in prokaryotes. 


Science 321, 960-964 (2008). 


. Garneau, J. E. et al. The CRISPR/Cas bacterial immune system cleaves 


bacteriophage and plasmid DNA. Nature 468, 67-71 (2010). 


. Jinek, M. et a/. A programmable dual-RNA-guided DNA endonuclease in adaptive 


bacterial immunity. Science 337, 816-821 (2012). 


. Yosef, |. Goren, M. G. & Qimron, U. Proteins and DNA elements essential for the 


CRISPR adaptation process in Escherichia coli. Nucleic Acids Res. 40, 5569-5576 
(2012). 


. Datsenko, K. A. et al. Molecular memory of prior infections activates the CRISPR/ 


Cas adaptive bacterial immunity system. Nature Commun. 3, 945 (2012). 


. Swarts, D. C., Mosterd, C., van Passel, M. W. & Brouns, S. J. CRISPR interference 


directs strand specific spacer acquisition. PLoS ONE 7, e35888 (2012). 


. Nufiez, J. K. et al. Cas1-Cas2 complex formation mediates spacer acquisition 


during CRISPR-Cas adaptive immunity. Nature Struct. Mol. Biol. 21, 528-534 
(2014). 


. Wiedenheft, B. et al. Structural basis for DNase activity of a conserved 


protein implicated in CRISPR-mediated genome defense. Structure 17, 904-912 
(2009). 


. Babu, M. et al. A dual function of the CRISPR-Cas system in bacterial antivirus 


immunity and DNA repair. Mol. Microbiol. 79, 484-502 (2011). 


. Kim, T. Y., Shin, M., Huynh Thi Yen, L. & Kim, J. S. Crystal structure of Cas1 from 


Archaeoglobus fulgidus and characterization of its nucleolytic activity. Biochem. 
Biophys. Res. Commun. 441, 720-725 (2013). 

Beloglazova, N. et a/. A novel family of sequence-specific endoribonucleases 
associated with the clustered regularly interspaced short palindromic repeats. 

J. Biol. Chem. 283, 20361-20371 (2008). 

Samai, P., Smith, P. & Shuman, S. Structure of a CRISPR-associated protein Cas2 
from Desulfovibrio vulgaris. Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. 66, 
1552-1556 (2010). 

Nam, K. H. et a/. Double-stranded endonuclease activity in Bacillus halodurans 
clustered regularly interspaced short palindromic repeats (CRISPR)-associated 
Cas2 protein. J. Biol. Chem. 287, 35943-35952 (2012). 

Li, M. & Craigie, R. Processing of viral DNA ends channels the HIV-1 integration 
reaction to concerted integration. J. Biol. Chem. 280, 29334-29339 (2005). 
Cherepanov, P. LEDGF/p75 interacts with divergent lentiviral integrases 

and modulates their enzymatic activity in vitro. Nucleic Acids Res. 35, 113-124 
(2007). 

Hare, S. et al. A novel co-crystal structure affords the design of gain-of-function 
lentiviral integrase mutants in the presence of modified PSIP1/LEDGF/p75. PLoS 
Pathog. 5, €1000259 (2009). 

Yang, J. Y., Jayaram, M. & Harshey, R. M. Positional information within the Mu 
transposase tetramer: catalytic contributions of individual monomers. Cell 85, 
447-455 (1996). 

Dinardo, S., Voelkel, K. A., Sternglanz, R., Reynolds, A. E. & Wright, A. Escherichia coli 
DNA topoisomerase | mutants have compensatory mutations in DNA gyrase 
genes. Cel/ 31, 43-51 (1982). 

Pruss, G. J., Manes, S. H. & Drlica, K. Escherichia coli DNA topoisomerase | mutants: 
increased supercoiling is corrected by mutations near gyrase genes. Ce// 31, 
35-42 (1982). 

Chow, S.A., Vincent, K.A., Ellison, V. & Brown, P. O. Reversal of integration and DNA 
splicing mediated by integrase of human immunodeficiency virus. Science 255, 
723-726 (1992). 


. Au, T.K., Pathania, S. & Harshey, R. M. True reversal of Mu integration. EMBO J. 23, 


3408-3420 (2004). 


. Engelman,A., Mizuuchi, K. & Craigie, R. HIV-1 DNA integration: mechanism of viral 


DNA cleavage and DNA strand transfer. Cell 67, 1211-1221 (1991). 


. Mizuuchi, K. & Adzuma, K. Inversion of the phosphate chirality at the target site of 


Mu DNA strand transfer: evidence for a one-step transesterification mechanism. 
Cell 66, 129-140 (1991). 


. Curcio, M. J. & Derbyshire, K. M. The outs and ins of transposition: from mu to 


kangaroo. Nature Rev. Mol. Cell Biol. 4, 865-877 (2003). 


. Arslan, Z. Hermanns, V., Wurm, R., Wagner, R. & Pul, U. Detection and 


characterization of spacer integration intermediates in type I-E CRISPR-Cas 
system. Nucleic Acids Res. 42, 7884-7893 (2014). 


. Tyson, G. W. & Banfield, J. F. Rapidly evolving CRISPRs implicated in acquired 


resistance of microorganisms to viruses. Environ. Microbiol. 10, 200-207 (2008). 


. Sheflin, L. G. & Kowalski, D. Altered DNA conformations detected by mung bean 


nuclease occur in promoter and terminator regions of supercoiled pBR322 DNA. 
Nucleic Acids Res. 13, 6137-6154 (1985). 


12 MARCH 2015 | VOL 519 | NATURE | 197 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


37. 
38. 


39. 
40. 
4l. 


42. 


43. 
44. 
45. 


Goren, M.G., Yosef, |., Auster, O. & Qimron, U. Experimental definition of a clustered 
regularly interspaced short palindromic duplicon in Escherichia coli. J. Mol. Biol. 
423, 14-16 (2012). 

Savitskaya, E., Semenova, E., Dedkov, V., Metlitskaya, A. & Severinov, K. High- 
throughput analysis of type |-E CRISPR/Cas spacer acquisition in E. coli. RNA Biol. 
10, 716-725 (2013). 

Shmakov, S. et al. Pervasive generation of oppositely oriented spacers during 
CRISPR adaptation. Nucleic Acids Res. 42, 5907-5916 (2014). 

Deveau, H. et al. Phage response to CRISPR-encoded resistance in Streptococcus 
thermophilus. J. Bacteriol. 190, 1390-1400 (2008). 

Semenova, E. et a/. Interference by clustered regularly interspaced short 
palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc. Natl Acad. 
Sci. USA 108, 10098-10103 (2011). 

Westra, E. R. et al. Type I-E CRISPR-cas systems discriminate target from non- 
target DNA through base pairing-independent PAM recognition. PLoS Genet. 9, 
e1003742 (2013). 

Craigie, R.& Bushman, F.D. HIV DNA integration. Cold Spring Harbor Perspect. Med. 
2, a006890 (2012). 

Nowotny, M. Retroviral integrase superfamily: the structural perspective. EMBO 
Rep. 10, 144-151 (2009). 

Hochstrasser, M. L. & Doudna, J. A. Cutting it close: CRISPR-associated 
endoribonuclease structure and function. Trends Biochem. Sci. 40, 58-66 
(2015). 


198 | NATURE | VOL 519 | 12 MARCH 2015 
©2015 Macmillan Publishers Limited. All rights reserved 


46. Palecek, E. Local supercoil-stabilized DNA structures. Crit. Rev. Biochem. Mol. Biol. 
26, 151-226 (1991). 


Acknowledgements We are grateful to M. Chung, P. J. Kranzusch and AV. Wright for 
technical assistance and members of the Doudna laboratory and J. Cate for 
discussions. This project was funded by US National Science Foundation grant no. 
1244557 to J.A.D. and by NIH grant Al070042 to A.E. This work used the Vincent 

J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 
Instrumentation Grants S1ORRO29668 and S10RRO27303. J.K.N. is supported by a 
US National Science Foundation Graduate Research Fellowship and a UC Berkeley 
Chancellor's Graduate Fellowship. A.S.Y.L. is supported as an American Cancer Society 
Postdoctoral Fellow (PF-14-108-01-RMC). J.A.D. is an Investigator of the Howard 
Hughes Medical Institute and a member of the Center for RNA Systems Biology. 


Author Contributions J.K.N. performed the biochemical experiments. A.S.Y.L. 
processed and analysed the high-throughput sequencing data. J.K.N., A.S.Y.L, A.E. and 
J.A.D. designed the study, analysed the data and wrote the manuscript. 


Author Information Sequencing data are deposited in Gene Expression Omnibus 
under accession number GSE64552. Reprints and permissions information is 
available at www.nature.com/reprints. The authors declare competing financial 
interests: details are available in the online version of the paper. Readers are welcome 
to comment on the online version of the paper. Correspondence and requests for 
materials should be addressed to J.A.D. (doudna@berkeley.edu). 


METHODS 


Cas1, Cas2 and DNA preparation. The cas1 and cas2 genes from E. coli K12 
(MG1655) were cloned into expression vectors and the proteins were separately 
purified as previously described’®. The proteins were stored in 100 mM KCl, 20 mM 
HEPES-NaOH, 5% glycerol and 1 mM TCEP at —80 °C before use. Single-stranded 
DNAs were synthesized (Integrated DNA Technologies). Double-stranded DNA 
protospacers were annealed in 20 mM HEPES-NaOH, pH 7.5, 25 mM KCl, 10 mM 
MgCl, or MnCl, 1 mM DTT, 10% DMSO by heating at 95 °C for 3 min and slow 
cooling to room temperature. The sequence of the 33 bp protospacer used in this 
study was shown to be the most acquired in vivo in E. coli K12 after M13 bacterio- 
phage infection’: strand 1 (5’-GCCCAATTTACTACTCGTTCTGGTGTTTCT 
CGT-3') and strand 2 (5’-ACGAGAAACACCAGAACGAGTAGTAAATTGG 
GC-3'). The pCRISPR target plasmid was constructed by PCR amplifying the 
E. coli BL21-AI genomic CRISPR locus and cloning the fragment into pUC19 using 
the following primers with the underlines indicating the respective restriction 
sites used: forward/EcoRI: 5’-ACGTCGAATTCTACCTTTTTAATCAATGG-3’ 
and reverse/AflIII: 5'-ACGTCACATGTGGTTATATGGTGGTTTATCC-3’. The 
pACYC CRISPR plasmid was constructed by cloning the CRISPR fragment into a 
pACYCDuet-1 vector using the EcoNI and Avrll restriction sites. 

In vitro integration assays. The integration reactions were performed in 20 mM 
HEPES-NaOH, pH 7.5, 25 mM KCl, 10 mM MgCl, or MnCl,, 1 mM DTT and 10% 
DMSO. There was little difference when DMSO was omitted from the reaction 
(Fig. 1d), in contrast to its in vitro integration enhancement with HIV-1 integrase’”. 
All of the reactions were conducted with MgCl, unless otherwise noted. For reac- 
tions with the Cas1-Cas2 complex, separately purified Cas] and Cas2 were pre- 
incubated for 20-30 min at 4 °C to allow complex formation. The protospacer DNAs 
were incubated with the protein(s) for 10-15 min at 4 °C, followed by the addition 
of the target pCRISPR or pUC19 plasmid DNA. The reactions were conducted at 
37 °C for 1 h and quenched with DNA loading buffer containing a final concentra- 
tion of 50 mM EDTA. The products were analysed on 1.5% agarose gels pre-stained 
with ethidium bromide. All of the reactions, except those shown in Fig. 1 and Ex- 
tended Data Fig. 1a, c—e, were conducted with 75 nM protein, 200 nM protospa- 
cers and 7.5 nM pCRISPR to clearly visualize band X from pCRISPR. Reactions in 
Fig. 1 and Extended Data Fig. 1a, c, e were performed with 50 nM protospacers. 
Each integration and disintegration assay was performed a minimum of three times. 
Radiolabelled protospacer integration assays. Pre-annealed double-stranded pro- 
tospacer DNA substrates were 5’-radiolabelled using [y-**P]-ATP (PerkinElmer) 
and T4 polynucleotide kinase (New England Biolabs). Protospacers with 3’-PO4 
ends were 5'-radiolabelled using T4 polynucleotide kinase with 3’ phosphatase 
minus activity (New England Biolabs). The reactions were carried out in the same 
buffer as above. Unless otherwise noted, 200 nM of Cas1—Cas2 was first incubated 
with 20 nM protospacers at 4 °C for 10-15 min, followed by the addition of 200 ng 
(~5 nM) of pCRISPR. The reactions were conducted at 37 °C for 1 h and quenched 
with 25mM EDTA and 0.4% SDS. The DNA samples were deproteinized with 
30 pug of proteinase K for 1 h at 37 °C and ethanol precipitated. The reactions were 
analysed on 1.5% agarose gels. After electrophoresis, the gels were dried onto pos- 
itively charged nylon transfer membrane (GE Healthcare) and imaged using Phos- 
phor Screens (GE Healthcare). The restriction enzyme digest experiments were 
performed by first conducting the integration reaction, followed by addition of the 
respective enzymes (New England Biolabs), which were allowed to digest for an 
additional 1 h at 37 °C. 
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Disintegration assays. The four single-stranded DNA substrates were annealed 
to form the Y DNA ina stepwise manner: 95 °C for 3 min, 65 °C for 20 min, 50 °C 
for 20 min, and gradual cooling to room temperature. The annealing reactions 
were analysed on a 15% native polyacrylamide gel to confirm the formation of the 
Y DNA (Extended Data Fig. 5b). The disintegration assay was performed in the 
integration reaction buffer with 50 nM protein and 5 nM Y DNA at 37 °C for 1h. 
For native polyacrylamide gel analysis, the reaction was quenched with DNA load- 
ing buffer with 50 mM EDTA and analysed on 15% polyacrylamide gels. For dena- 
turing polyacrylamide gel analysis, the reactions were quenched with formamide 
buffer and heated at 95 °C before loading on 15% 8M urea-polyacrylamide gels. 
The sequences of the four strands are as follows: A (5’-GGCCCCAGTGCTGCA 
ATGAT-3’); B (5'-GTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGG 
GGCC-3'); C (5'-GCCCAATTTACTACTCGTTCTGGTGTTTCTCGTACCGC 
GAGACCCACGCTCAC-3’); and D (5'-ACGAGAAACACCAGAACGAGTAG 
TAAATTGGGC-3’). 

High-throughput sequencing. The integration reaction was performed with 75 nM 
Cas1—Cas2, 200 nM protospacer and 7.5 nM pCRISPR or pUC19 in 20 mM HEPES, 
pH7.5, 25 mM KCl, 10 mM MgCl, 10% DMSO and 1 mM DTT. The DNAs were 
isolated by phenol-chloroform extraction and ethanol precipitation. The excess 
protospacers were removed using 100K MWCO Amicon Ultra-0.5 ml centrifugal 
filters. The resulting integration products were digested into smaller DNA frag- 
ments using dsDNA Fragmentase (New England Biolabs) for 75 min at 37 °C and 
quenched at 65 °C for 15 min. Fragments were end repaired using T4 DNA Poly- 
merase (NEB), Klenow (NEB) and T4 PNK (NEB) and A-tailed with Klenow exo 
(3' to 5’ exo minus) (NEB). Adapters were ligated onto fragments using T4 DNA 
ligase (NEB) and cDNA libraries were amplified by PCR using Phusion (NEB). 
Libraries were sequenced on an Illumina HiSeq2500 on rapid run mode. The oli- 
gonucleotides used are: universal adaptor: 5’-AATGATACGGCGACCACCGA 
GATCTACACTCTTTCCCTACACGA CGCTCTTCCGATC*T-3’ (*phosphor- 
othioate bond); indexed adaptor: 5'-/5Phos/GATCGGAAGAGCACACGTCTG 
AACTCCAGTCAC-index-ATCTCGTATGCCGTC TTCTGCTTG-3’); PCR pri- 
mers: 5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACAC 
GA-3’, 5'-CAAGCAGAAGACGGCATACGAGAT-3’. 

Computational analysis. For preprocessing, 3’ adapters were removed from raw 
Illumina reads using Cutadapt (http://code.google.com/p/cutadapt/), discarding 
reads shorter than 15 nt. Reads containing integrated protospacer were selected 
using Cutadapt, requiring the presence of at least 10 nt of protospacer sequence 
with no errors. After creating Bowtie“ indexes from fasta files of the pUC19 empty 
and pCRISPR plasmid sequences, these reads were mapped to the respective plas- 
mids using Bowtie, allowing up to 2 mismatches and requiring unique mapping. 
Sequence motif analysis depicted in Extended Data Fig. 8 were generated using 
WebLogo, using integration sites that are represented at least ten times in the 
sequencing data”. 

Sample size. No statistical methods were used to predetermine sample size. 
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bromide staining was performed after electrophoresis. d, PCR amplification three replicates. 
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Extended Data Figure 6 | Cas1-Cas2 can integrate various lengths of 
double-stranded DNA with blunt- or 3’-overhang ends into a supercoiled 
target plasmid. a, Integration assays using the indicated lengths of protospacer 
DNA. b, Integration assays using varying 5' or 3’ overhang lengths. c, d, A 
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pCRISPR target. e, Integration assay using different target plasmids with or 
without a CRISPR locus. The green arrows correspond to the relaxed product 
of each target and the cyan arrows correspond to the band X product. 

The data presented in a-e are representative of at least three replicates. 
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Extended Data Figure 8 | High-throughput sequencing of integration 
products reveals sequence-specific integration. a, Schematic of the workflow 
for high-throughput sequencing analysis of the integration sites. b, Raw map 
of the total reads along pCRISPR before collapsing into single peaks of 
protospacer-pCRISPR junctions depicted in Fig. 4. c, Same as b, except for the 


pUC19 target. d, Sequence of the leader-end of the CRISPR locus in E. coli. 
e, f, WebLogo analysis from the —5 to +5 positions surrounding the 
protospacer integration sites on the plus (e) and minus (f) of pCRISPR. 

The arrow points to the nucleotide that is covalently joined to the protospacer. 
g, h, Same as e, f, except for the pUC19 target. 
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strand-specific recognition of the 3’-TTC-5’ PAM sequence in the target strand 
by the crRNA-guided Cascade complex. Incorrect protospacer integration 
(right) cannot initiate foreign DNA destruction due to the inability for the 
crRNA to recognize the strand with the 3'-TTC-5’ PAM. Thus, foreign DNA 
interference during CRISPR-Cas adaptive immunity relies on the Cas1—-Cas2 
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Cas9 specifies functional viral targets 
during CRISPR-Cas adaptation 


Robert Heler', Poulami Samai!*, Joshua W. Modell’, Catherine Weiner’, Gregory W. Goldberg’, David Bikard!? 


& Luciano A. Marraffini! 


Clustered regularly interspaced short palindromic repeat (CRISPR) loci and their associated (Cas) proteins provide 
adaptive immunity against viral infection in prokaryotes. Upon infection, short phage sequences known as spacers inte- 
grate between CRISPR repeats and are transcribed into small RNA molecules that guide the Cas9 nuclease to the viral 
targets (protospacers). Streptococcus pyogenes Cas9 cleavage of the viral genome requires the presence of a 5’-NGG-3’ 
protospacer adjacent motif (PAM) sequence immediately downstream of the viral target. It is not known whether and 
how viral sequences flanked by the correct PAM are chosen as new spacers. Here we show that Cas9 selects functional 
spacers by recognizing their PAM during spacer acquisition. The replacement of cas9 with alleles that lack the PAM 
recognition motif or recognize an NGGNG PAM eliminated or changed PAM specificity during spacer acquisition, respec- 
tively. Cas9 associates with other proteins of the acquisition machinery (Cas1, Cas2 and Csn2), presumably to provide 
PAM-specificity to this process. These results establish a new function for Cas9 in the genesis of prokaryotic immu- 


nological memory. 


CRISPR loci and Cas proteins provide adaptive immunity to bacteria 
and archaea against their viruses’. To adapt to highly dynamic viral 
populations, CRISPR-Cas loci evolve rapidly, acquiring short phage 
sequences, known as spacers, that integrate between CRISPR repeats 
and constitute a memory record of infection’. Spacers are tran- 
scribed into small CRISPR RNAs (crRNAs) that identify viral targets 
(defined as protospacers) by direct Watson-Crick pairing with invas- 
ive DNA’. Based on their cas gene content, CRISPR-Cas systems can 
be classified into three distinct types: I, II and III (ref. 4). Each CRISPR- 
Cas type possesses different mechanisms of crRNA biogenesis, target 
destruction and prevention of autoimmunity. In the type II CRISPR-Cas 
system present in Streptococcus pyogenes the Cas9 nuclease inactivates 
infective phages using crRNAs as guides to introduce double-strand 
DNA breaks into the viral genome’. Cas9 cleavage requires the presence 
of a protospacer adjacent motif (PAM) sequence immediately down- 
stream of the protospacer®’. This requirement avoids the cleavage of 
the spacer sequence within the CRISPR array, that is, autoimmunity, as 
the adjacent repeat lacks a PAM sequence. The importance of the PAM 
sequence for target recognition and cleavage°”° suggests the presence of 
a mechanism to ensure that newly acquired spacer sequences match 
protospacers flanked by a proper PAM sequence. For the type I-E CRISPR- 
Cas system of Escherichia coli, overexpression of cas1 and cas2 is suf- 
ficient for the acquisition of new spacers in the absence of phage 
infection. Reports indicate that spacers acquired in this fashion match 
preferentially (25-70%, depending on the study) to protospacers with 
the correct PAM (AWG, W = A/T)'*®, suggesting that Cas] and Cas2 
are sufficient for spacer acquisition and have some intrinsic ability to 
recognize protospacers with the right PAM. In the type II system of S. 
pyogenes, the PAM sequence is NGG (and also NAG at a much lower 
frequency)**"4, where N is any nucleotide; this motif is recognized and 
bound by a domain within the Cas9 nuclease during target cleavage”">. 
How spacers are acquired in this system, particularly how spacers with 
correct PAM sequences are selected during this process, is not known. 


Cas9 is required for spacer acquisition 

To investigate the mechanisms of recognition of PAM-adjacent pro- 
tospacers during spacer acquisition, we cloned the type II-A CRISPR- 
Cas locus of S. pyogenes (Fig. 1a) into the staphylococcal vector pC194 
(ref. 16) and introduced the resulting plasmid (pWJ40 (ref 17)) into 
Staphylococcus aureus RN4220 (ref. 18), a strain lacking CRISPR-Cas 
loci. We chose this experimental system because it facilitates the genetic 
manipulation of the S. pyogenes CRISPR-Cas system. We first tested the 
ability of the cells to mount adaptive CRISPR immunity by infecting 
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Figure 1 | Cas9 is required for spacer acquisition. a, Organization of the 

S. pyogenes type II CRISPR-Cas locus. Arrows indicate the annealing position 
of the primers used to check for the expansion of the CRISPR array. b, PCR- 
based analysis of liquid cultures to check for the acquisition of new spacer 
sequences in the presence or the absence of phage ONM4y4 infection. Wild 
type (WT) as well as different cas mutants were analysed. Image is 
representative of three technical replicates. m.o.i., multiplicity of infection. 

c, Cultures overexpressing Cas1, Cas2 and Csn2 under the control of a 
tetracycline-inducible promoter were analysed using PCR for spacer acquisition 
in the absence of phage infection. The strain was complemented with plasmids 
carrying either Streptococcus thermophilus (St) or S. pyogenes (Sp) Cas9 (see 
Extended Data Fig. 3), in the last case with or without the tracrRNA gene 
(Atracr). Image is representative of three technical replicates. aT c, 
anhydrotetracycline. 
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them with the staphylococcal phage ONM474, a lytic variant of ONM4 
(ref. 19) (see Methods for a description of ONM4Y74 isolation). Plate- 
based assays performed by mixing bacteria and phage in top agar allowed 
the selection of phage-resistant colonies that were checked by PCR to 
look for the expansion of the CRISPR array (Extended Data Fig. 1a). 
On average 50% of the colonies acquired one or more spacers (8/13, 
5/11 and 7/16 in three independent experiments), whereas the rest of the 
resistant colonies survived phage infection by a non-CRISPR mecha- 
nism, most likely including phage receptor mutations (Extended Data 
Fig. 2a). To maximize the capture of new spacer sequences, we per- 
formed the same assay in liquid and recovered surviving bacteria at the 
end of the phage challenge. These were analysed by PCR of the CRISPR 
array and the amplification products of expanded loci were subjected 
to Illumina MiSeq sequencing to determine the extent of spacer acqui- 
sition. Analysis of 2.96-million reads detected protospacers adjacent to 
2,083 out of 2,687 NGG sequences present in the viral genome, although 
with variation in the frequency of acquisition of each sequence (Extended 
Data Fig. 1b). The data revealed a prominent selection of spacers match- 
ing protospacers with downstream NGG PAM sequences (99.97%, 
Extended Data Fig. 1c). The acquisition of new spacers by cells in liquid 
culture proved to be simple and highly efficient, providing the possibil- 
ity to look at millions of new spacers in a single step. It was therefore 
used in the rest of our studies. 

To determine the genetic requirements for spacer acquisition, we 
made individual deletions of cas1, cas2 or csn2 and challenged the mutant 
strains with phage ONM4y4. Spacer acquisition was decreased to levels 
below our limit of detection in each of these mutants (Fig. 1b), corrob- 
orating previous experiments'*”°. Therefore although Cas1, Cas2 and 
Csn2 are dispensable for anti-phage immunity in the presence ofa pre- 
existing spacer (Extended Data Fig. 2b, c), they are required for spacer 
acquisition. To determine whether these genes are also sufficient for 
this process, we overexpressed cas1,cas2 and csn2 in the absence of cas9 
using a tetracycline-inducible promoter in plasmid pRH223 and looked 
for the integration of new spacers in the absence of phage infection 
using a highly sensitive PCR assay (Extended Data Fig. 3). We were unable 
to detect new spacers even in the presence of the inducer (Fig. 1c). 
However, the addition of a second plasmid expressing tracrRNA (see 
below) and Cas9 from their native promoters (Extended Data Fig. 3) 
enabled spacer acquisition only in the presence of the inducer, with all 
the new spacers matching chromosomal or plasmid sequences (Fig. 1c 
and Extended Data Table 1). Although it is most likely that the acquisi- 
tion of such spacers causes cell death or plasmid curing, respectively, 
the acquisition event can still be detected in liquid culture using our 
highly sensitive PCR assay (Extended Data Fig. 3b, c). The tracRNA 
(Fig. 1a) is a small RNA bound by Cas9 that is required for crRNA 
processing’ and Cas9 nuclease activity®. We wondered if Cas9 involve- 
ment in spacer acquisition also required the presence of the tracrRNA. 
Deletion of the tracrRNA prevented spacer acquisition in the absence 
of phage infection (Fig. 1c), suggesting that apo-Cas9 is not sufficient to 
promote spacer acquisition and that association with its cofactor is also 
required. Altogether these data indicate that Cas1, Cas2 and Csn2 are 
necessary but not sufficient for the incorporation of new spacers and 
that a tracrRNA-Cas9 complex is also required. This is in contrast to 
the type I-E CRISPR-Cas system of E. coli, in which overexpression of 
Cas1 and Cas? alone is sufficient for spacer acquisition’® ’’. It is impor- 
tant to note that the CRISPR array used in this assay consists of a single 
repeat, without pre-existing spacers (Extended Data Fig. 3). Therefore 
the Cas9 requirement is not a consequence of the phenomenon known 
as ‘primed’ spacer acquisition. This refers to an increase in the frequency 
of spacer acquisition observed in type I CRISPR-Cas systems that relies 
on the presence ofa pre-existing spacer with a partial match to the phage 
genome as well as the full targeting complex (Cascade)'?”"”. 


Cas9 specifies the PAM of newly acquired spacers 


Given this newfound requirement in the CRISPR adaptation process 
and the well-established PAM recognition function of Cas9 during the 
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surveillance and destruction of viral target sequences, we hypothesized 
that this nuclease could participate in the selection of PAM sequences 
during spacer acquisition. To test this we exchanged the cas9 genes of S. 
pyogenes (Sp) and S. thermophilus (St) CRISPR-Cas systems to create 
two chimaeric CRISPR loci: tracrRNA*?-cas9*'-cas1°P-cas2°?-csn2? 
and tracrRNA*'-cas9°?-cas1**-cas2**-csn2"' (Fig. 2a). We chose the type 
II-A CRISPR-Cas system of S. thermophilus (also known as CRISPR3 
(ref. 23)) because it is an orthologue of the S. pyogenes system™*. While 
the PAM sequence for the Sp CRISPR-Cas system is NGG, the PAM 
sequence for the St system is NGGNG” (Fig. 2b and Extended Data 
Table 1). We infected each naive strain with phage (NM4y4, sequenced 
the newly acquired spacers, and obtained the PAM of the matching 
protospacers using WebLogo”’. We found that each chimaeric system 
acquired spacers with PAMs that correlated with the cas9, but not the 
tracrRNA, cas1, cas2 or csn2, allele present (Fig. 2b and Extended Data 
Table 1). To rule out the possibility that non-functional spacers are 
negatively selected during phage infection, that is, they are acquired 
randomly but only those cells containing spacers with a correct PAM 
for Cas9 cleavage provide immunity and allow cell survival, we sequenced 
the PAMs of spacers acquired in the absence of phage infection (Figs 1c 
and 2c). Either Cas9°? or Cas9*' were produced in cells overexpressing 
Cas1*P, Cas2°? and Csn2°?. In this experiment, as explained earlier, spacers 
matching chromosomal or plasmid sequences were acquired. The PCR 
products containing new spacers were cloned into a commercial vector 
from which they were sequenced (Extended Data Table 1). Expression 
of Cas9*? led to the incorporation of spacers matching protospacers 
with an NGG PAM sequence, whereas the expression of Cas9* in the 
same cells shifted the composition of the PAM to NGGNG (Fig. 2d). 
These results demonstrate that Cas9 determines PAM sequences dur- 
ing CRISPR adaptation to ensure the acquisition of functional spacers. 


Cas9 associates with Cas], Cas2 and Csn2 


In type I CRISPR-Cas systems, Cas1 and Cas2 form a complex’* and 
the dsDNA nuclease activity of Cas1 has been implicated in the initial 
cleavage of the invading viral DNA to generate a new spacer”®. The 
genetic analyses presented above suggest that in the type II S. pyogenes 
CRISPR-Cas system, the PAM-binding function of Cas9 observed in 
vitro’ could specify a PAM-adjacent site of cleavage for Cas1, or other 
members of the spacer acquisition machinery. This would guarantee 
that newly acquired spacers have the correct PAM needed for Cas9 
activity later in this immune pathway. This hypothesis predicts an 
interaction between Cas9 and Cas1, Cas2 and/or Csn2. To test this we 
expressed the type II Cas operon in E. coli, using a histidyl tagged ver- 
sion of Cas9, and looked for other proteins that co-purify. We observed 
an abundant co-purifying protein with an apparent molecular weight 
close to 33 kDa, the expected size of Cas1 (Extended Data Fig. 4a). Mass 
spectrometry confirmed the identity ofboth of these proteins, as well as 
the presence of Cas2 and Csn2 co-purifying with Cas9 (Extended Data 
Table 2). This result suggested the formation of a Cas9—-Cas1-Cas2- 
Csn2 complex and therefore we explored other purification strategies 
to unequivocally determine its existence. We were able to isolate a Cas9- 
Cas1—Cas2—Csn2 complex when the histidyl tag was added to Csn2 
(Fig. 3a, b). The identity of the purified proteins was confirmed by mass 
spectrometry (Extended Data Table 3). This demonstrates a biochemical 
link between the Cas9 nuclease and the other Cas proteins that function 
exclusively to acquire new spacers, supporting the role of Cas9 as a PAM 
specificity factor in the adaptation phase of CRISPR immunity. 


Cas9 PAM-binding motif is needed for spacer selection 

Within this complex the PAM-binding domain of Cas9 would specify 
a functional spacer (one adjacent to a correct PAM) and the nuclease 
activity of Cas] and/or Cas9 would cleave the invading DNA to extract 
the spacer sequence. To test this model, we performed adaptation studies 
in the absence of phage selection as described in Extended Data Fig. 3 
but using different combinations of wild-type Cas1, Cas1(E220A) (cat- 
alytically dead or dCas1 (ref. 26)), wild-type Cas9, Cas9(D10A,H840A) 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a b 
925 
S. pyogenes — Gael EID IEEE ERB) csn2— Z| OG n= 20 
ao” © eee 
pet 
S. thermophilus Qe am a> Emer» 5, 1.GG Go. = 
= 224 
cas9St-cas 15P-cas2SP-csn2SP 2 csn2 — i a GG G : n=20 
227. San 
1234567 8910 
d 
Cc sitet oes 
— — tacr EI — 1 GG n=20 
Teast )[GAS2) csn2 — + By seep iemnearie 
io =14 
a - =, 4.GG.G a" 


Figure 2 | Cas9 determines the PAM sequence of acquired spacers. 

a, c, Genetic composition of the CRISPR-Cas loci tested for spacer during 
phage infection (a), or in the absence of infection (c), with the experimental set 
up shown in Extended Data Fig. 3. b, d, Sequence logos obtained after the 


(catalytically dead or dCas9 (ref. 6)) and Cas9(R1333Q,R1335Q) 
(abbreviated here as Cas9°“™, containing mutations in the PAM-bind- 
ing motif that substantially reduces binding to target DNA sequences 
with NGG PAMs in vitro’’). We observed that the nuclease activity of 
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Figure 3 | S. pyogenes Cas9 PAM recognition domain is required for the 
acquisition of spacers with an NGG PAM sequence. a, Separation of the 
Cas9-Cas1—Cas2-Csn2 complex by ion exchange chromatography. b, SDS- 
PAGE of fraction 19 (peak) from the complex elution shown in panel 

a, representative of five technical replicates. The four proteins of the complex 
were individually purified and run alongside the purified fraction to identify 
each protein in the complex. c, Spacer acquisition was tested as in Fig. 1c in the 
presence or absence of different Cas1 or Cas9 activities. Image is representative 
of eight technical replicates. dCas1, nuclease-dead Cas1 (E220A mutation); 
dCas9, nuclease-dead Cas9 (D10A, H840A mutations); Cas9?“™ lacks the 
PAM recognition function (R1333Q, R1335Q mutations). d, Sequence logos 
obtained after the alignment of the 3’ flanking sequences of the protospacers 
matched by the newly acquired spacers in panel c. Numbers indicate the 
positions of the flanking nucleotides downstream from the spacer. Number of 
sequences used in each alignment indicated as n. 
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alignment of the 3’ flanking sequences of the protospacers matched by the 
newly acquired spacers in panels a and ¢, respectively. Numbers indicate the 
positions of the flanking nucleotides downstream from the spacer. Number of 
sequences used in each alignment indicated as n. 


Cas] is necessary for spacer acquisition (Fig. 3c). In contrast, the nucle- 
ase activity and PAM-binding function of Cas9 are dispensable for this 
process. Next we determined the PAM of the acquired spacers in the 
presence of mutated Cas9 (Fig. 3d). We found that whereas spacers 
acquired in the presence of dCas9 displayed correct PAMs, those 
acquired in the presence of Cas9°“™ matched DNA regions without 
a conserved flanking sequence, that is, without a PAM sequence. Cells 
containing the catalytically dead Cas9(D10A,H847A) from S. thermo- 
philus acquired spacers with NGGNG PAMs (Extended Data Fig. 5). 
These results indicate that Cas1 and Cas9 are part of a complex dedicated 
to spacer acquisition which requires Cas1 nuclease activity and Cas9 
PAM-binding properties for the selection of new spacer sequences. 


Discussion 


The selection of new spacers with a correct PAM is fundamental for 
the survival of the infected host during CRISPR-Cas immunity. In the 
simplest scenario there is no active selection of PAM-flanked proto- 
spacers; any spacer sequence can be acquired but only those with the 
correct PAM allow Cas9 cleavage of the invader and survival. Bacteria 
that acquire spacers with ineffective flanking sequences are killed by 
the virus and as a consequence PAM-flanking spacers are enriched in 
the population. Here we show that even in the absence of phage selec- 
tion, the type II CRISPR-Cas system acquires new spacers with correct 
PAMs, a result that rules out the possibility of random spacer selection 
with subsequent selection for functional spacers. How are PAM-flanked 
protospacers selected during type II CRISPR-Cas immunity? One pos- 
sibility is that the proteins exclusively dedicated to spacer acquisition 
perform the PAM-selection function. The inability of cells overexpres- 
sing only cas1, cas2 and csn2 to expand the CRISPR array strongly 
suggest that none of the proteins encoded by these genes can recognize 
and select correct PAMs. Another possibility is that the known PAM- 
recognition function of Cas9 (refs 15, 27), essential for destroying the 
invading virus, could also be used during spacer acquisition to recog- 
nize PAM-flanking viral sequences. Experiments showing that the cas9 
allele, but not the cas1 or cas2 or csn2 alleles, determines the PAM se- 
quence of the newly acquired spacers, demonstrated that this scenario 
is probably correct. Regarding the molecular mechanism by which Cas9 
participates in CRISPR adaptation, our experiments show that Cas9 
forms a stable complex with Cas1, Cas2 and Csn2 that presumably 
participates in the selection of new spacers. The nuclease activity of 
Cas1, but not of Cas9, is required for spacer acquisition. The tracrRNA 
is also required, suggesting that the apo-Cas9 structure”, very different 
from holo-Cas9 (ref. 15), does not have the correct conformation to 
participate in spacer acquisition. The key residues involved in Cas9 
PAM recognition are not required for spacer acquisition, but they are 
necessary for the incorporation of new spacers with the correct PAM 
sequence. This suggests that the reported non-specific DNA binding 
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property of Cas9 (refs 6, 7) is sufficient for spacer acquisition, but not 
for the selection of functional spacers. There are currently two models 
for the incorporation of new spacers into the CRISPR array, one where 
the future spacer sequence is cut from the invading viral DNA, the ‘cut 
and paste’ model, and another where this sequence is copied from the 
viral genome, the ‘copy and paste’ model”. In the context of the first 
model, our data suggests that, at a low frequency that may reflect the 
dynamics of spacer acquisition, Cas1 cleaves the invading genome to 
extract a new spacer sequence. However, on its own, Cas1 nuclease 
activity is non-specific’’. Therefore we propose that through the for- 
mation of the Cas9-Cas1-Cas2-Csn2 complex, Cas9 binding to PAM- 
adjacent sequences provides specificity to Cas1 endonuclease activity. 
In the copy and paste model, Cas1 nuclease activity is most likely 
necessary for downstream events, such as the cleavage of the repeat 
sequence that precedes spacer insertion, and Cas9 is required to ‘mark’ 
sequences adjacent to GG motifs to be copied into the CRISPR array. In 
any case, following as yet unknown processing and integration events, 
the selected DNA becomes a new functional spacer, that is, its match- 
ing protospacer will have the correct PAM to license Cas9 cleavage 
(Extended Data Fig. 6). The molecular steps that take place after pro- 
tospacer selection to incorporate it as a new spacer in the CRISPR array 
are still unknown. All genes of the type II-A CRISPR-Cas locus (tracrRNA, 
cas9, cas1, cas2 and csn2) are required for spacer acquisition, therefore 
most likely all the members of the Cas9-Cas1-Cas2-Csn2 complex 
participate in the process. Future work will address this and other aspects 
of the mechanisms of spacer integration in different CRISPR-Cas sys- 
tems. The present work reveals a new function for Cas9 in CRISPR im- 
munity. This nuclease is fundamental for both the execution of immunity, 
participating in the surveillance and destruction of infectious target 
viruses, and the generation of immunological memory, selecting the 
viral sequences that allow adaptation and resistance to viral predators. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Bacterial strains and growth conditions. Cultivation of S. aureus RN4220 (ref. 
18), was carried out in brain-heart infusion (BHI) or heart infusion (HI) media (BD) 
at 37°C. Whenever applicable, media were supplemented with chloramphenicol 
at 104g ml! or erythromycin at 5 pg ml! to ensure pC194-derived (ref. 16) and 
pE194-derived”’ plasmid maintenance, respectively. 

On-plate spacer acquisition assay. To detect individual adapted colonies on a 
plate, cells from overnight cultures were mixed with phage at a m.o.i. value of 1 in 
top agar containing appropriate antibiotic and 5 mM CaCl. The mixture was poured 
on BHI plates with antibiotic and incubated at 37 °C overnight. Subsequently, col- 
onies that survived phage infection were restreaked on fresh BHI plates in order to 
remove contaminating virus and dead cells. Plates were incubated at 37 °C overnight. 
To check for spacer acquisition, individual colonies were resuspended in lysis buffer 
(250 mM KCl, 5 mM MgCl, 50 mM Tris-HCl at pH 9.0, 0.5% Triton X-100), treated 
with 50 ng pl’ lysostaphin and incubated at 37 °C for 5 min, then 98 °C for 5 min. 
Following centrifugation (16,000g), a sample of the supernatant was used as template 
for TopTaq PCR amplification with primers L400 and H050. The PCR reactions 
were analysed on 2% agarose gels (Fig. 1a). 

In-liquid spacer acquisition assay. Overnight cultures launched from single col- 
onies were diluted 1:1,000 into a fresh 10-ml culture of BHI containing appropriate 
antibiotic and 5 mM CaCl). When the cultures reached Dgoo nm of 0.4, depending 
on the experiment, they were either infected with phage MOI value of 1 (Fig. 1b) or 
induced with 1 .g ml’ anhydrotetracycline (Fig. 1c). After 16h, plasmids carrying 
the CRISPR systems were extracted using a slightly modified QIAprep Spin Miniprep 
Kit protocol: the pelleted bacterial cells were resuspended in 250 ll buffer P1 con- 
taining 50 ng pl" lysostaphin and incubated at 37 °C for 1h, followed by the stand- 
ard QIAprep protocol. 100 ng of plasmid DNA was used to amplify the CRISPR 
locus using Phusion DNA Polymerase (New England Biolabs) with the following 
primer mix: 3 parts JW8 and 1 part each of JW3, JW4 and JW5 (Extended Data 
Table 4). The following cycling conditions were used: (1) 98 °C for 30 s; (2) (for 30 
times) 98 °C for 10 s, 64°C for 20 s, 72 °C for 10 s; (3) 72 °C for 5 min. The PCR re- 
actions were analysed on 2% agarose gels. To sequence individual spacers, the 
adapted bands were extracted, gel-purified and cloned via Zero Blunt TOPO PCR 
Cloning Kit (Invitrogen). CRISPR loci of individual clones were checked for expan- 
sion of the arrays by PCR using the primers listed above and sent for sequencing. 
Phage adsorption assay. The phage adsorption assay was performed as described 
previously” with minor modifications. Cells were grown in BHI and 10 mM CaCl, 
toa Deoo nm (ODgo0) of 0.4. The phage solution was prepared at 10° plaque-forming 
units (p.f.u.) per ml and 100 ul of this was added to 900 ul of cells The mixture was 
incubated for 10 min at 37°C to allow adsorption of the phage to the cellular 
membrane. The mixture was centrifuged for 1 min at 16,000g and the number 
of phage particles left in the supernatant was determined by phage titre assay. 
Phage titre assay. Serial dilutions of the phage stock were prepared in triplicate 
and spotted on fresh top agar lawns of RN4220 in HI agar supplemented with the 
appropriate antibiotic and 5 mM CaCl). Plates were incubated at 37 °C overnight 
(Extended Data Fig. 2). 

High-throughput sequencing. Plasmid DNA was extracted from adapted cultures 
using the in-liquid spacer acquisition assay described above. 100 ng of plasmid 
DNA was used as template for Phusion PCR to amplify the CRISPR locus with primers 
H182 and H183 (Extended Data Table 4). Following gel extraction and purification 
of the adapted bands, samples were subject to Illumina MiSeq sequencing. 
Plasmid construction. Construction of pWJ40 was described elsewhere’’. For the 
construction of pC194-derived and pE194-derived plasmids, cloning was per- 
formed using chemically competent S. aureus cells, as described previously'’. The 
Acas1 (pRH059), Acas2 (pRH061) and Acsn2 (pRH063) mutants were constructed 
by one-piece Gibson assembly"! from pWJ40 using the pairs of primers H016- 
H017, H018-H019, H020-H021, respectively (Extended Data Table 4). Plasmid 
pRHO087 containing the wild type cas genes of S. pyogenes was obtained by inserting 
the first spacer of S. pyogenes (annealed primers H049 and H050 containing com- 
patible Bsal overhangs) in pDB184 using Bsal cloning”. Bsal cloning was also used 
to construct pRH079 and pRH233 by inserting a )NM474 targeting spacer (annealed 
primers H029 and H030) into pDB114 and pDB184, respectively. Plasmid pRH200 
harbours the wild-type CRISPR3 system from S. thermophilus LMD-9 amplified 
with H168 and H169 from genomic DNA. The fragment was inserted on pE194 via 
Gibson assembly using H166 and H167. pRH213 was constructed by replacing 
Cas9*? on pRH087 with Cas9“' from pRH200 using the primer pairs H232-H233 
and H231-H234, respectively. pRH214 was constructed by replacing Cas9** on 
pRH200 with Cas9°? from pRH087 using the primer pairs H227—H230 and H228- 
H2239, respectively. pGG32 was created by reducing the CRISPR locus of pWJ40 to 
a single repeat. This was accomplished by ‘round the horn’ PCR® using primers 
o0GG82 and 0GG83, followed by blunt ligation. pRH228 was constructed by replacing 
Cas9*P on pGG32 with Cas9* from pRH200 using the primer pairs H232-H233 
and H231-H234, respectively. pRH223 was constructed as a three-piece Gibson 
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assembly combining TetR+ ptet from pKL55-iTet (primers B534 and B616), pE194 
(primers B532 and B617) and the cas1, cas2, csn2 genes and the array from pGG32 
(primers H176-H177). pRH231 was constructed from pGG32 by one piece Gibson 
assembly with primers H289-H290. pRH234 contains Cas] E220A and was con- 
structed via one-piece Gibson assembly from pRH223, respectively, using the pri- 
mer pair H312-H313. pRH227 was constructed from pGG32 via two sequential 
single-piece Gibson assemblies: first, D10A was introduced with B337-B338 and 
second, H840A was introduced with B339-B340. pRH229 was constructed via one- 
piece Gibson assembly from pGG32 using the primer pair H276-H277. Plasmids 
pRH240, pRH241, pRH242, pRH243 and pRH244 were constructed by one-piece 
Gibson assembly with primers H237-H238 from pGG32, pRH228, pRH227, 
pRH229 and pRH231, respectively. pRH245 was constructed from pRH241 via 
two sequential single-piece Gibson assemblies: first, D10A was introduced with 
H336-H337 and second, H847A was introduced with H338-H339. 

Isolation and sequencing of ¢NM4y4. For the initial isolation of ¢NM4, super- 
natants from overnight cultures of S. aureus Newman were filtered and used to 
infect soft agar lawns of TB4:: 6NM1,2 double lysogens’’. A single plaque was 
picked and then plaque-purified in two additional rounds of infection using TB4 
soft agar lawns, and subsequently used to lysogenize TB4. For the resultant lyso- 
gen, specific primers were used to verify the presence of )NM4 and the absence of 
oNM1,2 by colony PCR. High titre lysates of 6NM4 (~10"' p.f.u. per ml) were 
then prepared from this lineage and used for infection of TB4/pGG9 soft agar 
lawns harbouring spacer 2B’’. An escaper plaque was picked and then plaque- 
purified in two additional rounds of infection using TB4/pGG9 soft agar lawns. 
The resultant 6NM4y4 phage exhibited a clear plaque phenotype and was used to 
prepare a high titre lysate from which DNA was purified, deep sequenced, and 
assembled as described previously’’. The full sequence of the 6NM474 has been 
deposited in GenBank under accession number KP209285, and includes a 2,784 bp 
deletion encompassing the C-terminal 80% of the @NM4 cI-like repressor gene. 
Protein purification of Cas9. pMJ806 (wild-type Cas9) plasmid was obtained from 
Addgene. The proteins were purified as described before® with minor modifica- 
tions as follows. The proteins were expressed in E. coli BL21 Rosetta 2(DE3) codon 
plus cells (EMD Millipore). Cultures (2 litres) were grown at 37 °C in Terrific Broth 
medium containing 50 1g ml ' kanamycin and 34 ig ml! chloramphenicol until 
the Deoonm reached 0.6. The cultures were supplemented with 0.2 mM isopropyl-1- 
thio-f-p-galactopyranoside and incubation was continued for 16h at 16°C with 
constant shaking. The cells were collected by centrifugation and the pellets stored 
at —80°C. All subsequent steps were performed at 4°C. Thawed bacteria were 
resuspended in 30 ml of buffer A (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 200 mM 
Li,SO,4, 10% sucrose, 15 mM imidazole) supplemented with complete EDTA free 
protease inhibitor tablet (Roche). Triton X-100 and lysozyme were added to final 
concentrations of 0.1% and 0.1 mg ml” ', respectively. After 30 min, the lysate was 
sonicated to reduce viscosity. Insoluble material was removed by centrifugation for 
1 hat 16,200g in a Beckman JA-3050 rotor. The soluble extract was bound in batch 
to mixed for 1 h with 5 ml of Ni?* -Nitrilotriacetic acid-agarose resin (Qiagen) that 
had been pre-equilibrated with buffer A. The resin was recovered by centrifuga- 
tion, and then washed extensively with buffer A. The bound protein was eluted step- 
wise with aliquots of IMAC buffer (50 mM Tris-HCl pH 7.5, 250 mM NaCl, 10% 
glycerol) containing increasing concentrations of imidazole. The 200 mM imida- 
zole elutes containing the His.-MBP tagged Cas9 polypeptide was pooled together. 
The Hisg-MBP affinity tag was removed by cleavage with TEV protease during 
overnight dialysis against 20 mM Tris-HCl pH 7.5, 150 mM KCl, 1 mM TCEP and 
10% glycerol. The tagless Cas9 protein was separated from the fusion tag by using 
a 5 ml SP Sepharose HiTrap column (GE Life Sciences). The protein was further 
purified by size exclusion chromatography using a Superdex 200 10/300 GL in 
20 mM Tris HCl pH 7.5, 150 mM KCl, 1 mM TCEP, and 5% glycerol. The elution 
peak from the size exclusion was aliquoted, frozen and kept at —80 °C. 

Protein purification of Cas1. Plasmid pKW01 (wild-type Cas1) was constructed 
by through amplification of pWJ40 as a template for polymerase chain reactions 
(PCRs) to clone Cas1 into pET28b-His,9Smt3 using the primers PS192 and PS193 
(Extended Data Table 4). Full sequencing of cloned DNA fragment confirmed 
perfect matches to the original sequence. The pKW01 plasmid was transformed 
into E. coli BL21 (DE3) Rosetta 2 cells (EMD Millipore). Cultures were grown and 
protein was purified by Ni-affinity chromatography step, as mentioned before in 
Cas9 purification. The 200mM imidazole elutes containing the His, -Smt3 
tagged Cas1 polypeptide was pooled together. The His; -Smt3 affinity tag was 
removed by cleavage with SUMO protease during overnight dialysis against 
50 mM Tris-HCl pH 7.5, 250 mM NaCl, 20 mM imidazole and 10% glycerol. The 
tagless Cas1 protein was separated from the fusion tag by using a second Ni-NTA 
affinity step. The protein was further purified by size exclusion chromatography 
using a Superdex 200 10/300 GL in 20 mM Tris HCl pH 7.5, 500 mM KCl, 1mM 
TCEP, and 5% glycerol. The elution peak from the size exclusion was aliquoted, 
frozen and kept at —80 °C. 
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Protein purification of Cas2. The sequence encoding Cas2 was PCR amplified 
with primers PS334 and PS335 from pWJ40 and inserted into a pET-Hiss MBP 
TEV cloning vector (Addgene Plasmid number 29656) using ligation independ- 
ent cloning (LIC). Sequencing of the resultant plasmid (pPS059) confirmed the 
matches to the wild-type sequence. The protein was expressed and purified 
following the same procedure as that for Cas9. 

Protein purification of Csn2. Plasmid pPS060 was constructed by through 
amplification of pWJ40 as a template for polymerase chain reactions (PCRs) to 
clone Csn2 into pET28b-His;,Smt3 using the primers PS336 and PS337. Full 
sequencing of cloned DNA fragment confirmed perfect matches to the original 
sequence. Csn2 was expressed and purified following the same method as that of 
Cas1. Previously Csn2 was shown to form a tetramer™. Protein concentrations for 
all the purifications were determined by using the Bradford dye reagent with BSA 
as the standard. 

Protein purification of Cas9-Cas1-Cas2-Csn2 complex. pK W07 (His) 9-Cas9- 
Cas1-Cas2-Csn2) was constructed by amplification of pWJ40 with primers 
PS199/PS202 and pET16b (Novagen) with primers PS200/PS203, followed by 
Gibson assembly of the fragments. Full sequencing of cloned DNA fragment was 
done to confirm perfect matches to the original sequence. The proteins were expressed 
in E. coli BL21 Rosetta 2(DE3) codon plus cells (EMD Millipore). Cultures were 
grown and protein was purified by Ni-affinity chromatography step, as mentioned 
before in Cas9 purification with minor modifications. The 200 mM imidazole eluates 
were dialysed overnight against 20 mM Tris-HCl pH 7.5, 150 mM KCl, 1 mM TCEP 
and 10% glycerol and subjected to mass spectrometry for the identification of the 
co-purifying proteins. pKW06 (Cas9-Cas1-Cas2-Csn2-Hisg) was constructed by 


amplification of pWJ40 with primers PS204/PS205 and pET23a (Novagen) with 
primers PS206/PS207 (Extended Data Table 4), followed by Gibson assembly of 
the fragments. Full sequencing of cloned DNA fragment was done to confirm per- 
fect matches to the original sequence. The proteins were expressed in E. coli BL21 
Rosetta 2(DE3) codon plus cells (EMD Millipore). Cultures were grown and pro- 
tein was purified by Ni-affinity chromatography step, as mentioned before in Cas9 
purification with minor modifications. The 200 mM imidazole eluates were dia- 
lysed overnight against 20 mM Tris-HCl pH 7.5, 150mM KCl, 1 mM TCEP and 
10% glycerol. The proteins were further purified using a 5 ml SP Sepharose HiTrap 
column (GE Life Sciences), eluting with a linear gradient of 150 mM-1 M KCl. 
Sample size. No statistical methods were used to predetermine sample size. 
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Extended Data Figure 1 | The S. pyogenes type II CRISPR-Cas system 
displays a strong bias for the acquisition of spacers matching viral 
protospacers with NGG PAMs. a, Analysis of bacteriophage-insensitive 
mutant colonies using PCR and agarose gel electrophoresis, representative of 
five technical replicates. Bacteria and phage were mixed in top agar and 
incubated overnight. DNA was isolated from individual colonies resistant to 
phage infection and used as template for a PCR reaction with primers (arrows) 
H182 and H183 (Extended Data Table 4), which amplify the 5’ end of the S. 
pyogenes CRISPR array. The size of the PCR band indicates the number of new 
spacers (shown at the top of the gel). Cells without additional spacers resist 
infection by a CRISPR-independent mechanisms, presumably envelope 
resistance. b, Analysis of acquired spacers during phage infection of a 
population of bacteria carrying the S. pyogenes type II CRISPR-Cas system. 
Liquid cultures of bacteria were infected with phage, surviving cells were 


collected at the end of the infection, DNA extracted and used as template for a 
PCR reaction as described above. Amplification products were separated by 
agarose gel electrophoresis and the DNA of the bands corresponding to 
products with additional spacers was extracted and sent for Mi-Seq next- 
generation sequencing. Reads corresponding to newly acquired spacers were 
plotted according to their position in the phage 6NM4y4 genome (x axis) and 
their abundance (y axis). Each dot represents a unique spacer sequence; blue 
and red dots indicate a corresponding protospacer with an NGG or non-NGG 
PAM. Top and bottom plots indicate protospacers in the top and bottom 
strands of the )NM474 DNA. The map as well as the different functions of the 
phage genes are indicated in between the plots. The raw data used to make this 
graph is in the Source Data File . c, Weblogo showing the conservation of the 5’ 
flanking sequences of 10,000 protospacers randomly selected from the 
experiment shown in b. Absolute conservation of the NGG PAM was observed. 
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Extended Data Figure 2 | cas1, cas2 and csn2 are not required for the 
execution of immunity. a, Analysis of bacteriophage-resistant mutants that do 
not acquire a new spacer. Three colonies that survived phage infection in our 
in-plate adaptation assay (Extended Data Fig. 1) were subjected to phage 
adsorption assay. Briefly, surviving colonies as well as the wild-type S. aureus 
RN4220 control were grown in liquid and mixed with bacteriophage. After a 
brief incubation, cells were pelleted by centrifugation and the phages present in 
the supernatant (unable to bind and infect cells) were counted on a lawn of 
sensitive cells. The number of plaque-forming units (p.f.u.) of a control 
experiment in the absence of host cells were used to determine the 100% free- 
phage, or 0% adsorption value. No plaques were observed in the control 
experiment using wild-type cells and this value was used to set the 100% 
adsorption limit. The three CRISPR-independent, bacteriophage-resistant 
mutants displayed a marked defect in phage adsorption (about 50%), indicating 


wt Acas7 
Acas2 
Acsn2 


that most likely they carry envelope resistance mutations. Error bars: 

mean + s.d. (n = 3). b, cas1, cas2 and csn2 are not required for the execution of 
immunity using previously acquired spacers. Position within the phage 
oNM474 genome of the type II CRISPR-Cas target used in this experiment. 
The protospacer sequence is in the bottom strand (shown in 3’-5’ direction) 
and flanked by a TGG PAM (in green). c, Comparison of immunity provided by 
a type II CRISPR-Cas system programmed to target the sequence shown in 
panel a in the presence (wild-type, wt) or absence (Acas1,Acas2, Acsn2) of cas1, 
cas2 and csn2. Immunity is measured as the p.f.u. of a ONM474 phage lysate 
spotted on top agar lawns of S. aureus RN4220 cells containing no CRISPR 
system (—), a wild-type S. pyogenes CRISPR-Cas type II system (wt, pRH233), 
or the same CRISPR-Cas systems with a deletion of cas1, cas2 and csn2 genes 
(Acas1, Acas2, Acsn2, pRH079). Error bars: mean ~ s.d. (n = 3). 
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Extended Data Figure 3 | Generation of an experimental system for the 
overexpression of cas1, cas2 and csn2 and the detection of spacer acquisition 
in the absence of phage infection. a, Plasmids used in the spacer acquisition 
experiments presented in Figs 1c and 2c, d. pRH223 contains cas1, cas2 and 
csn2 from S. pyogenes under a tetracycline-inducible promoter. Cells containing 
this plasmid only acquired spacers when a second plasmid expressing cas9 was 
introduced, pRH240 or pRH241, containing the tracrRNA gene, the leader and 
first repeat from the S. pyogenes type II CRISPR-Cas system as well as cas9 from 
S. pyogenes (cas9°P) or S. thermophilus (cas9*), respectively. The leader is a 
short, AT-rich sequence immediately upstream of the first repeat that contains 
the promoter for the transcription of the CRISPR array. b, Highly sensitive PCR 
assay to enrich for amplification products of adapted CRISPR loci. Arrows 
indicate primer annealing position and direction. The forward primer (JW8) 


anneals on the leader. For the reverse primer, a cocktail of JW3, JW4 and JW5 
was used. The three reverse primers anneal on the repeat and differ only in their 
3’-end nucleotide that never matches the last nucleotide of the leader (red 
arrowhead). Because this nucleotide is critical for the annealing of the primers, 
loci that acquire spacers ending in A, C or T are preferentially amplified over 
unadapted loci. c, To quantify the sensitivity of this technique, we mixed 
pGG32 (one repeat, unadapted) with pRH087 (repeat-spacer-repeat, adapted) 
in known ratios. The amplification of adapted plasmid was detected even when 
it represented 0.01% (10 *) of the total plasmid template, representative of 
three technical replicates. This highly sensitive PCR assay is not required to 
detect acquisition during phage infection, as in this case adapted cells survive 
and are enriched within the population, making their detection much easier. 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Imidazole 


———— kDa 
200- 
= |-His,-Cas9 

100- 

50- 

37-— 

= -Cas1 
25-—> +—+ 3 
bef ih 
15-G eee eee | =4 


Extended Data Figure 4 | Purification of a Cas9-Cas1-Cas2-Csn2 
complexes. a, The cas9-cas1-cas2-csn2 operon of S. pyogenes SF370 was 
cloned into the pET16b vector (generating pKW07) to add an N-terminal 
histidyl tag to Cas9 and express all proteins in E. coli. Purification was 
performed using Ni-NTA affinity chromatography. SDS-PAGE followed by 
Coomassie staining of the purified proteins revealed a co-purifying protein that 
was identified as Cas1 by mass spectrometry, in a result representative of five 
technical replicates. Mass spectrometry identification of all the eluted proteins 
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co-purifying with Cas9 is shown in Extended Data Table 2. b, The cas9-cas1- 
cas2-csn2 operon of S. pyogenes SF370 was cloned into the pET23a vector 
(generating pKW06) to add an C-terminal histidyl tag to Csn2 and express all 
proteins in E. coli. Purification was performed using Ni-NTA affinity 
chromatography followed by ion exchange chromatography. The elution 
fractions that constituted the peak containing the complex (Fig. 3a) were 
separated by SDS-PAGE and visualized by Coomassie staining, representative 
of three technical replicates. 
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Extended Data Figure 5 | dCas9* can also support spacer acquisition. A 
plasmid derived from pRH241 containing mutations in the active site of S. 
thermophilus Cas9 (D10A, H847A; dCas9*') was used to characterize spacer 
acquisition in the absence of phage infection. Upon overexpression of Cas1, 
Cas2 and Csn2 using anydrotetracycline (aTc), we were able to detect spacer 
acquisition. Sequencing of spacers and alignment of the protospacer flanking 
sequences demonstrated the selection of am NGGNG PAM. The image is 
representative of three technical replicates. 
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Extended Data Figure 6 | A model for the selection of PAM-flanking spacers 
by Cas9. After injection of the phage DNA, an adaptation complex formed by 
Cas9, Cas1, Cas2 and Csn2 uses the Cas9 PAM binding domain to specify 
functional protospacers, that is, that are followed by the correct PAM. It is not 
known how the protospacer sequence is extracted from the viral DNA to 
become a spacer. In the ‘cut and paste’ model, a nuclease, possibly Cas1, cuts the 
viral DNA to generate the spacer. In the ‘copy and paste’ model the protospacer 
sequence is copied first. Once loaded with the selected protospacer sequence, 
this complex promotes the integration of this sequence into the CRISPR array, 
thus becoming a new spacer. Previous studies demonstrated that Cas1 
dimerizes and interacts with Cas2 (ref. 13); Csn2 has been determined to forma 
tetramer”. 
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Extended Data Table 1 | Sequences of the spacers analysed to obtain the sequence logos in this study 
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Figure Spacer Sequence PAM Target Figure Spacer Sequence PAM Target 
2b 1 gcaacaatgggaaccaagctatgttgatag aGGgt phage 2d 4 tacaatgtaggctgctctacacctagcttc tGGgc pRH223 
1st logo 2 gagaacaaaaccatcctacccggtaataaa tGGta phage 1st logo 5 tttgattacaatggcacatgtacttatgcc tGGat chromosome 
3 aatagagatactttatctaacatgatacac gGGag phage 6 catttgtcttagcacatgaattaggtcatgc aGGtc chromosome 
4 ccattttagatttcaaaagtttagtatctat aGGca phage 7 catgattgcacccattgttgcacctagtac  aGGtt chromosome 
5 agtattggaatctgatgaatattcatctct cGGta phage 8 taccaataacttaagggtaactagcctcgc cGGca chromosome 
6 agaaaatttatacattgattattcaccaac aGGca phage 9 gagtatgtttgcgcgtgaagtggttgtgtc tGGat pRH223 
7 acatactccaaacaattgatggatttgtgt aGGtg phage 10 agaatggttagatttatggcgtgatgtaac gGGca chromosome 
8 gctaagactgtgaagcataatactgctact aGGta phage 11 ctgcttccatgataactggaccatcagcaac cGGat chromosome 
9 ttttaagctattcattttaaaaggtcatat gGGca phage 12 gattgaagctacaatacctgatgttgctgc gGGaa chromosome 
10 acttatgccgtttctatacttcactacagca tGGtc phage 13 cgaaatacttggctaagcacgacgaggcct tGGtg pRH223/240 
11 atgaatggattgaagagaacacagacgaac aGGac phage 14 gctctacacctagcttctgggcgagtttac gGGtt pRH223 
12 ccacaaatagaaatagagctagggagtttaa cGGta phage 15 atatggaagttacattttttggaacgagtgc aGGtt chromosome 
13 attagttactccacaaatagaaatagagct aGGga phage 16 ttatgaagcgttacgtcaacaagattttcc aGGat chromosome 
14 ggagtaactaatatctgaattgttatcagt tGGtt phage 17 ttatcgaagtatacgagttcacagaagaac  aGGct chromosome 
15 tagttttttgagtatgcttactttttcttg tGctt phage 18 ccagttcttgttgttttggtgctttagtca aGGtt chromosome 
16 tgaacgaattgtcagtatgtacagattaat aGGaa phage 19 tggatgatcttgtctttcatgtgtacctgt tGGaa chromosome 
17 cattacggacgtagtagaagcaattagaaa tGGaa phage 20 caggatttagttttcctagcggtcatgcta tGGga chromosome 
18 tggatatgacgaccaagatttagcgtttta aGGtg phage 2d al ttgactatcaaatgtctttttcaatgtttc gGGtG chromosome 
19 cgacataacgctaatacatgtttgtcatag tGGctt phage 2nd logo 2 atccgttctgcagaagagattgtttcttgc aGGcG pRH223 
20 acaaacttaacaatagtggttttttcaaga gGGag phage 3) tgaacatttcgattatgtattaatgagtgc tGGtG chromosome 
2b all agagtacaatattgtcctcattggagacac tGGgG phage 4 catctttaggacgaatgccagcacgttctgc tGGaG chromosome 
2nd logo 2) tgtttgggaaaccgcagtagccatgattaa gGGtG phage 5 caccatgttaaaaatacctccatcatcacc aGGaG pRH223/241 
3 ctcatattcgttagttgcttttgtcataaa aGGtG phage 6 tcgtgagacagttcggtccctatccgtcgt gGGcG chromosome 
4 tttatgtctatatactcaaagtaatcatttt cGGaG phage u tttgcgcagtcggcttaaaccagttttcgc tGGtG pRH223 
5 taatatcaacggtatgtggetgtctggtga cGGtG phage 8 aaagaagtcataagtaccatgacttgagtt tGGtG chromosome 
6 aataagtctaaaaaaccaacgtttaatgat tGGgG phage $I} ctaatttttcttcttcaacaccatctatggc tGGcG chromosome 
7 gttgatattacgttcatagaacatacctga tGGtG phage 10 ccaagtattcaaagttggaacggetgstct aGGtG chromosome 
8 tcaatgtttggtacaagttggtcacagata tGGaG phage ait atccgttctgcagaagagattgtttcttgc aGGcG pRH223 
9 ttagttactccacaaatagaaatagagcta gGGaG phage al?) tttgcgcagtcggcttaaaccagttttcgc tGGtG pRH223 
1@ caattgtttttcttggaaatcatatttata cGGcG phage als} aacgcgtatacatagcaagcgttctcatgt tGGaG chromosome 
sale tatctaagtttgccaattattacattaaagce tGGtG phage 14 agtttgggagtcaattatcggctttttaac tGGcG chromosome 
uly. taggacatagagatgaaaaaacgactataa aGGtG phage 3d 1 tgacttctctgaagagccatctttttgcact tGGaa chromosome 
13 tgaagaaatgattcaagaaacacaaaagag tGGcG phage 1st logo 2 ggtcagatgcaattcgacatgtggacggac tGGctt pRH223 
14 tcggactgttaggegtacgcgaagggcaaaa aGGaG phage 3 atcttttctagcttttctccaagcacagac aGGac chromosome 
15 aatactttcttctaaaaaacctaagtcaac aGGaG phage 4 gttggtctaattgtttcaatagttccacct tGGtc chromosome 
16 taatccaattacaacattaaaaattaatga cGGaG phage 5 tgccggttgggetgectgagacggcaccct aGGaa chromosome 
aly? acaatgttaagcaaccagcacattacacata cGGcG phage 6 tgagtatgtttgcgcgtgaagtggttgtgtc tGGat pRH223 
18 ggattttaaaataaaagtaaatgttgatac tGGcG phage 7 ttgagttagaaaacggtcgtaaacggatgc tGGct pRH242 
ale) caggcaatgttattttatcggattttaaaaa cGGcG phage 8 agtttgggagtcaattatcggctttttaac tGGcg chromosome 
20 agaatctttattattagctgacttacaaga aGGtG phage 9 aattaagaaatcttctaaccaactgattgc tGGaa chromosome 
Al aaaaccccaatatcttttaaaaataaagtt aGGtG phage 10 aacagaaagaataggaaggtatccgactgc tGGta pRH223/242 
22 tagggcaatgattgaagaatttgatgataa cGGaG phage 11 tggtattgtaggcgttattttaggtattcc gGGat chromosome 
2b 1 aaaggcaacatatttgaatcatcacatttat tGGaG phage 12 aaatctcagcaggacaagctggtacaggtgc tGGctt chromosome 
3rd logo 2 ttggaatggaattaaacaataaaactttta tGGaG phage 13 ctcaagagatttggagcatccaatcaatgc aGGtc pRH223 
3 atattcatcagattccaatactacgttaat aGGtG phage 14 ctaaggtggcaccacggtaacgcgtccttac aGGta chromosome 
4 acaatttaaaaattagaaatgtaaatgtag aGGtG phage 15 tgattaaacttaaaaatgtattacctagtgc aGGta chromosome 
5 cagaatgaactatgaaacagggetccaact aGGtG phage 16 atttgagtcagctaggaggtgactgatggc tGGctt pRH223 
6 acataacatcaaaaccctttctgaagaaat tGGtG phage 17 ataagagaagatgctagacgtataagttcac tGGtc chromosome 
7 taagttgtttgaaatgtacgagatggaagg aGGaG phage 18 acgttttatctgtatttgcgacaatcgttg gGGta chromosome 
8 atacgtgtaaagacatattagatcgagtca aGGaG phage 19 ataacatacgccgagttatcacataaaagc gGGaa pRH223 
9 tgtgcaggagctacgttcaataaatgtgaa aGGaG phage 20 gcattttaaacaaaaaaagatagacagcac tGGca pRH223 
10 ttaagaaagttattgtcatcgagcttaaat tGGtG phage 21 aatcccagttagaacaaacgctaaaatggc gGGcc chromosome 
11 acacacatactaaacctgaacgattaagga gGGgG phage 22 taccaataacttaagggtaactagcctcgc cGGca chromosome 
12 tttaccaacatccttagttgatagattttt aGGcG phage 3d 1 gaagtctagctgagacaaatagtgcgatta caaaa pRH223/243 
13 gtttgaatacgttccgtttctgatacccagt aGGcG phage 2nd logo 2 agcatagctctaaaacctcgtagactattt ttgtc pRH223/243 
14 aagttaaaaagaatttaaagtcaagaagta tGGgG phage 3 aaattttttagacaaaaatagtctacgag gtttt pRH223/243 
15 attctcagaagatagcgaagatgggagaaa aGGaG phage 4 aagtcgaacttcataatcatcgctttcgg catat chromosome 
16 tgagcgactgctgggtgtgcttcgaatagtt tGGcG phage 5) ccaatttctacagacaatgcaagttggget gtggg chromosome 
17 taatatatgctcatacttaattgaattgtc tGGtG phage 6 gttatttctgaaatgcccgttacatcacgc cataa pRH243 
18 atcttcttttttaatacgtccatcaacaag cGGtG phage 7 tgtttgccctccaaatatgaaaacatggcc cggta chromosome 
19 cgatattggcggtgtgaataataactttaa aGGaG phage 8 atgagatgaggcgataaaagaacgtcgcta aaacg chromosome 
20 caacgagctggcaacaacataagatgacag aGGcG phage 9 tactacttcaaggaattctatagaacctac tatat chromosome 
2b 1 taaactactacgacttaagcagetgccata tGGca phage 1e gtaccacagtgccacatgttggcaattggc gagac chromosome 
4th logo 2 gacaaatgctattcaacattcagttaaaga aGGta phage lst taaagctggtgaagcgattaacactgtacc aagta chromosome 
3 acaattattaattgaacaagcgcaagctaa cGGct phage 12 atttcttcgttattagaaatataaaattgc gttgt chromosome 
4 cacatcaattagtaagacgccaaaagtaac aGGta phage 13 attttttatgattaagccatatggggttaa gcaag pRH223/243 
5 aaacgatgagtacacaaaatacaaaatcta cGGca phage 14 aaagactgggatccaaaaaaatatggtggt tttga pRH243 
6 gtaataatatttttaataacctcaacatct tGGtc phage 15 attttcaaatgcataaaaactgtttctcaac gatat chromosome 
a tcatgaaaaagtgaattgctagtagtgtget tGGtc phage 16 ttttgtattggaatggcattttttgctatc aaggt chromosome 
8 tacgctatcgcaaaagcagtcaaagctaaa gGGca phage aly) taaaacaggaccacttgtcatgtaagcttt aagtt pRH223/243 
e) agggaatcttacagttattaaataactatt tGGat phage 18 tgtatcttgtggtttcatctgtgctaacttt ggcag chromosome 
18 aaaacgagcaaattaagtggtacgtagaca aGGgt phage ig) agaggatgcagaacgtgcaatcttagctgc aagac chromosome 
aL ctaaatgttgccatttcgttatctcctttc tGGta phage 20 ttcaaacgagaataattatggcgttggttta ggtat chromosome 
ile actggatgacattgaacaaagcaccgaata tGGcc phage ail gcgaatacactcattaaaacaattgcatcc tgatt chromosome 
13 taaatatttgataacaacattatacacgaa aGGag phage 22 tcttatcttgataataagggtaactattgc cgatg chromosome 
14 cacatcaattagtaagacgccaaaagtaac aGGta phage 23: Ccaaataaaggtgcgttattaataacagtgc caggc chromosome 
15 aaggtgatgacggcgaatggtacacaacata tGGtc phage 24 acaacagtacgccaaccagccatcagtcac ctcct pRH223 
16 taacgacggtacttattccgtcgttgctac tactg phage Ext. Data 1 cagctaacaatgccatgattgetcggctga gGGaG pRH223 
aly) ataaataaaaaagttactactcacacacta aGGca phage Fig. 5 2 tggtaaatttacagaagatgctgaagatgc tGGtG chromosome 
18 tctaggttcgaactcttctttaaatttaat aGGca phage 3 gagtcagctaggaggtgactgatggctget tGGcG pRH223 
ag) ctcatcaatatcattctgattggttatttt gGGat phage 4 caaataagtctagacatattagctcgttatc aGGtG chromosome 
20 tctctttgataaataactttatccacataa aGGtg phage 5 acgaccttgttgcaacatagcgccccactc tGGtG chromosome 
PAL ttagacttttactttccattacttaaatca tGGtc phage 6 acatgttatgcatatcgtaagtgaagtcac aGGtG chromosome 
22 aatttgttcttgcgcttcaatagtgatagt aGGgt phage 7 atccgttctgcagaagagattgtttcttgc aGGcG pRH223 
23 ataagtctaaaaaaccaacgtttaatgatt gGGga phage 8 agatgcttgttgtgttgtttgtgttgatgc cGGtG chromosome 
2d 1 acatgttatgcatatcgtaagtgaagtcac aGGta chromosome 9 atccgttctgcagaagagattgtttcttgc aGGcG pRH223 
1st logo 2 agatcaaattgtaacaactaatcctattgc aGGta chromosome 1e agtttgggagtcaattatcggctttttaac tGGcG chromosome 
3 gtttcagcaatatatctcttagtgcatcac cGGtt chromosome 11 aaaaagttatctcgtagacattacactggc tGGgG pRH245 
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Extended Data Table 2 | Mass spectrometry analysis of proteins purified through Ni-NTA 


Accession Protein Cc is age pacts . 7 Hai 
area 
Cas9 83.26 170 9.7x10'° 
Cas1 91.35 40 1.9x10"° 
Cas2 84.07 13 1.9x10° 
Csn2 91.82 18 2.9x10° 
P77398 Bifunctional polymyxin resistance protein ArnA (arnA) 85.76 43 8.2x108® 
P60422 50S ribosomal protein L2 (rp/B) 67.40 24 1.9x109 
P17169 Glucosamine--fructose-6-phosphate aminotransferase (g/mS) 79.31 38 1.8x108 
POAA43 Ribosomal small subunit pseudouridine synthase A (rsuA) 85.71 17 8.9x108 
POA9K9 FKBP-type peptidyl-prolyl cis-trans isomerase (s/yD) 68.88 7 3.7x10° 
POACJ8 Catabolite gene activator (crp) 82.86 18 5.4x108 
P45395 Arabinose 5-phosphate isomerase (kdsD) 73.17 21 1.2x108 
POA6F5 60 kDa chaperonin (groL) 83.94 38 2.8x108 
POA9A9 Ferric uptake regulation protein (fur) 78.38 8 1.2x10° 
P08622 Chaperone protein DnaJ (dnaJ) 72.07 19 1.4«10°9 
P00393 NADH dehydrogenase (ndh) 59.22 16 3.6x 108 
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Extended Data Table 3 | Mass spectrometry analysis of protein bands from the purified Cas9-Cas1-Cas2-Csn2 complex 


Protein % Coverage Unique Peptides Total peak area 
Cas1 67.82 26 3.4x108 
Cas2 90.27 13 1.2x10° 
Cas9 68.49 111 4.1x108 
Csn2 82.27 19 4.1x108 
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Extended Data Table 4 | Oligonucleotides used in this study 


Primer Sequence 
B337 gacgctatttgtgccgatagctaagcctattgagtatttc 


B338 _ gaaatactcaataggcttagctatcggcacaaatagcgtc 

B339 ggaaactttgtggaacaatggcatcgacatcataatcact 

B340 agtgattatgatgtcgatgccattgttccacaaagtttcc 

B532 ctttttccgtgatggtaactgttcatatttatcagagctcgtg 

B534 gagctctgataaatatgaacagttaccatcacggaaaaagettatg 
B616 ~—ttattttaattatgctctatcaa 

B617 gagtgatcgttaaatttatactgc 

H016 aggagetgactgatgggagttcctgaatttaggatatgag 

H0O17  — taaattcaggaactcccatcagtcacctcctagctgactc 

H018 _ttaggatatgagtgaggcttttgatgaatcttaatttttc 

H019 ttcatcaaaagcctcactcatatcctaaattcaggaactc 

H020 _tttgatgaatcttaataaaaatatggtataatactcttaa 

H021 ttataccatatttttattaagattcatcaaaagcctcccc 

H029 aaacaaaaatgttttaacacctattaacgtagtatg 

HO30 _aaaacatactacgttaataggtgttaaaacattttt 

H049 aaactgcgectggttgatttcttcttgcgctttttg 

H050 _—aaaacaaaaagcgcaagaagaaatcaaccagcgca 

H1i66 _—gaaatgtgagaagggacctctgataaatatgaacatgatgagtgatcg 
H1i67 _ggactcttttatctctactcgtgctataattatactaattttataaggagg 
H168 agtataattatagcacgagtagagataaaagagtcctttggatgattcc 
H169 tgttcatatttatcagaggtcccttctcacatttcaatactagactc 
H1i76 _—sttgatagagcataattaaaataagatgccactcttatccatcaatcc 
H177 gcagtataaatttaacgatcactctaaaacctctccaactacctccc 
H182 _ nnnnncagcaaaattttttagacaaaaatagtc 

H183 _ nnnnncagaagaagaaatcaaccagcgc 

H227  taatggcaggttggagaacagtagtc 

H228 _ actactgttctccaacctgccattagtcacctcctagctgactc 

H229 agatttttcaaataaggagaaatgtttgaaatcatcaaactcattatggatttaatttaaactttttattttagg 
H230 acatttctccttatttgaaaaatctaaatttatagaaattattatacgc 
H231 aactttttattttaggaggcaaaaagcgtataataatttctataaatttagatttttcaaataagg 
H232._ttttgcctcctaaaataaaaagtttaaattaaatccataatgag 

H233_ tgatggctggttggcgtac 

H234 _ caacagtacgccaaccagccatcaaccctctcctagtttggc 

H237 _ggcgtactgatgaagattatttcttaataactaaaaatatgg 

H238 _ tttagttattaagaaataatcttcatcagtacgccaaccagcc 

H276 _ttgatcaaaaacaatatacgtctacaaaagaag 

H277  _—tagacgtatattgtttttgatcaattgttgtatcaa 

H289 agcgcttgggagaaattcaaagaaatttatcagcc 

H290 _tttctttgaatttctcccaagcgctttcaaaacgc 

H312  gatattatggcaccatttaggcctttagtgg 

H313 aaaggcctaaatggtgccataatatcgctagc 

H336 _catactcaattggacttgctattggaacgaatagtgttg¢g 

H337 _ cgttccaatagcaagtccaattgagtatggcttagtc 

H338 _ gtaattatgatattgatgctattattcctcaagc 

H339 _ gaggaataatagcatcaatatcataattacttaatc 

JW3 aaaacagcatagctctaaaacg 

JW4 aaaacagcatagctctaaaaca 

JW5 aaaacagcatagctctaaaact 

JW8 ggcttttcaagactgaagtctag 

L400 cgaaattttttagacaaaaatagtc 

oGG82_ aacattgccgatgataacttgag 

oGG83_ gttttgggaccattcaaaacagcatagctctaaaacctcgtag 

PS192 CGCGGATCCATGGCTGGTTGGCGTACTGTTGTGG 

PS193 CGCCTCGAGTCATATCCTAAATTCAGGAACTCC 

PS199 CGAGCATATGACGACCTTCGATATGATCGGCAATGTTGAATGGAGACCATTC 
PS200 GAATGGTCTCCATTCAACATTGCCGATCATATCGAAGGTCGTCATATGCTCG 
PS202 CATCATCATCATCATCACAGCAGCGGCATGGATAAGAAATACTCAATAGG 
PS203 CCTATTGAGTATTTCTTATCCATGCCGCTGCTGTGATGATGATGATGATG 
PS204 CGACAAGCTTGCGGCCGCACTCGAGCTTTTTATTT TAGGAGGCAAAAATG 
PS205 GGATCTCAGTGGTGGTGGTGGTGGTGTACCATATTTTTAGT TAT TAAGAAATAATC 
PS206 GATTATTTCTTAATAACTAAAAATATGGTACACCACCACCACCACCACTGAGATCC 
PS207 CATTTTTGCCTCCTAAAATAAAAAGCTCGAGTGCGGCCGCAAGCTTGTCG 
PS284 GCTAGCGATATTATGGCACCATTTAGGCCTTTAG 

PS285 CTAAAGGCCTAAATGGTgCCATAATATCGCTAGC 

PS334 | TACTTCCAATCCAATGCAATGAGCTATCGCTATATG 

PS335 = TTATCCACTTCCAATGTTATTATTAGCTTTCATCAAAGGC 

PS336 CGCGGATCCATGAACCTGAACTTTAGCCTGCTGG 

PS337 CGCCTCGAGTTACACCATATTTTTGGTAATCAG 

PS354 GTTCCTGAATTTAGGATATGAAACATTGCCGATCATATCGAAGG 

PS355 CCTTCGATATGATCGGCAATGTTTCATATCCTAAATTCAGGAAC 
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Regulation of star formation in giant galaxies by 
precipitation, feedback and conduction 


G.M. Voit, M. Donahue’, G. L. Bryan? & M. McDonald? 


The Universe’s largest galaxies reside at the centres of galaxy clusters 
and are embedded in hot gas that, if left undisturbed, would cool 
quickly and create many more new stars than are actually observed’. 
Cooling can be regulated by feedback from accretion of cooling gas 
onto the central black hole, but requires an accretion rate finely tuned 
to the thermodynamic state of the hot gas*’. Theoretical models in 
which cold clouds precipitate out of the hot gas via thermal instability 
and accrete onto the black hole exhibit the necessary tuning*”°. Recent 
observational evidence shows that the abundance of cold gas in the 
centres of clusters increases rapidly near the predicted threshold for 
instability’’. Here we report observations showing that this precip- 
itation threshold extends over a large range in cluster radius, cluster 
mass and cosmic time. We incorporate the precipitation threshold 
into a framework of theoretical models for the thermodynamic state 
of hot gas in galaxy clusters. According to that framework, precipi- 
tation regulates star formation in some giant galaxies, while thermal 
conduction prevents star formation in others if it can compensate 
for radiative cooling and shut off precipitation. 

Our framework can be expressed in terms of the time t.,, required 
for the hot gas to radiate an amount of energy equivalent to its current 
thermal energy. If intracluster gas were unable to cool, cosmological struc- 
ture formation via hierarchical merging would produce galaxy clusters 
with radial cooling-time profiles that are similar to a baseline profile 
thase(7), Which can be computed with numerical simulations'*’’. Massive 
galaxy clusters are observed to converge to this baseline profile at large 
radii!*, but radiative cooling cannot be ignored at smaller radii, where 
too) Can be much shorter than the age of the Universe. Gas at small radii 
must either cool and condense or the cooling of that gas must trigger a 
thermal feedback that compensates for the radiative losses’. 

Thermal conduction is capable of compensating for cooling in clus- 
ter gas with t,o) > 1 billion years (Gyr)'*’’. Our framework therefore 
includes a locus of conductive balance, teonq(r), along which thermal 
conduction exactly balances radiative cooling'*. The locus itself is unstable, 
because conduction outcompetes cooling if t.o0) is above that locus but 
cannot compete below it’’. Conduction should therefore drive gas above 
the locus towards an isothermal core profile t;,.(r) identical to the base- 
line profile at large radii but with a constant temperature equal to the 
peak temperature of the baseline profile at smaller radii. Clusters in an 
isothermal core state have central cooling times exceeding ~1 Gyr, and 
so mergers with other galaxy clusters, which occur on timescales of several 
billion years, can compete with cooling and further raise f,,,) in the cores 
of those objects. Once t,o) exceeds the 14-Gyr age of the Universe, radi- 
ative cooling can no longer lower t,o, and this threshold corresponds 
to the ‘no cooling’ profile in our framework. 

Clusters with cooling-time profiles that go below the locus of conduc- 
tive balance require another heat source to balance cooling, and observa- 
tions have shown that outflows emanating from a central supermassive 
black hole are sufficiently energetic to stop the cooling’. However, the 
triggering mechanism for that feedback response remained elusive until 
recent numerical simulations provided the missing puzzle piece*"'°*°". 
Those simulations show that cold clouds start to precipitate out of hot-gas 


atmospheres in a state of global thermal balance when f,,,) drops to ten 
times the free-fall time tg = (27/, g)\! 2 where gis the local gravitational 
acceleration. The resulting precipitation feeds the central black hole 
through a ‘chaotic cold accretion’ process, producing a combination of 
thermal and kinetic feedback that maintains the necessary state of over- 
all thermal balance”””’. Sporadic eruptions of feedback then cause the 
minimum value of t,oo)/tfr to fluctuate within the range 5 < toooi/t < 20. 

We compute the critical profile for precipitation™* by assuming a 
two-component gravitational potential. The first component is a mass- 
density profile 0c (37/1500) '[1+(3r/rso0)]~ in which the mean den- 
sity within rso9 is 500 times the cosmological critical density’, and r599 
depends on the cluster’s gas temperature via kT ~ 125 ump[H (z)rsoo]"s 
where H(z) is the Hubble expansion parameter at the cluster’s cosmol- 
ogical redshift z and jum, is the mean mass per gas particle. The second 
component is a singular isothermal sphere (mass density x r *) with 
a velocity dispersion of 250 km s_' to represent the stellar mass profile 
of the central galaxy”®. Defining fprecip(7) = 10tgthen yields the critical 
profile for precipitation of cold clouds out of the hot gas. 

There are thus three ‘attractor’ profiles for cluster cores: (1) dynam- 
ical heating via mergers will push hot gas towards a long-lived state 
with tooo) > 14 Gyr, (2) thermal conduction will drive hot gas above the 
conductive-balance locus towards tis.(r), and (3) hot gas below the 
conductive-balance locus will cool, sink into the central galaxy, fall into 
a precipitating state, and trigger feedback that prevents t,,,) from drop- 
ping much below 10t¢. 

Comparing this framework of models with cooling-time profiles 
derived from the ACCEPT galaxy-cluster database“* strongly supports 
the hypothesis that precipitation regulates cooling and star formation 
in massive galaxies (Fig. 1). The lower envelope of the f.,1(r) data closely 
follows the max[fprecip(1); tbase(r)] boundary over multiple orders of mag- 
nitude in radius, multiple orders of magnitude in cooling time, and more 
than an order of magnitude in system temperature. It even reproduces 
the kink at the intersection of tprecip(r) and fhase(r), confirming that the 
mechanism which regulates cooling and star formation in the Universe’s 
largest galaxies prevents ft...) from dropping much below 10f,. This is an 
important finding, even ifthe precipitation-driven feedback model turns 
out to be incorrect, because it shows that the mechanism preventing 
runaway cooling in cluster cores depends critically on the toooi/tg ratio. 

The data also imply that thermal conduction separates precipitating 
clusters from non-precipitating clusters because the locus of unstable 
conductive balance neatly divides systems with multiphase gas from 
those without it. Detections of Ho and far-infrared emission from cluster 
cores’*”’ indicate the presence of multiphase gas, and the cooling-time 
profiles of all multiphase cluster cores either drop below f.ona(7) or are 
in its vicinity. In contrast, nearly all of the clusters without observable 
Ha emission stay above toona(r). The few single-phase cluster cores 
that dip below t.ona(r) may be objects in transition to a precipitating 
state because they are still outside the precipitation zone at 5 < toooi/ter 
<20. According to our framework, their multiphase counterparts 
with 20t¢ < tooor < teona(r) are likely to be systems in which a large burst 
of feedback has temporarily shut off precipitation but has not yet boosted 
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Figure 1 | Hot-gas cooling time as a function of radius in galaxy clusters. 
The observed ratio of cooling time to freefall time exhibits a hard floor at 
approximately 10, in accordance with model predictions” for precipitation- 
driven feedback. In a, dashed blue lines show cooling-time profiles for all 
objects in the ACCEPT database" with gas temperatures in the 2-10 keV range 
and Ha detections of multiphase gas. Solid purple lines show all 0.5-2.0 keV 
objects in ACCEPT with far-infrared detections of multiphase gas. The 

lower envelope of the cooling-time profiles closely follows the boundary 
defined by the precipitation threshold at t,ooi/t 10 (thick magenta line) and 
the cosmological baseline profile (brown), and most of those profiles enter the 
zone at 5 < toooy/tze < 20 (pink), within which precipitation-driven feedback 
stabilizes simulated galaxy clusters. The upper end of the t,o: envelope for 
multiphase systems lies in the vicinity of the locus of unstable conductive 
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balance (cyan), indicating that thermal conduction eliminates multiphase gas 
above that locus. In b, dashed red lines show cooling-time profiles for all 
2-10 keV objects in the ACCEPT database and no observable Ho emission. 
None of those profiles enters the precipitation zone, nearly all are above the 
locus of conductive balance, and most are between the isothermal core profile 
(green) and the cooling threshold (orange) at which the minimum cooling time 
equals the age of the Universe. All of the thick solid lines show model 
predictions for a 6 keV cluster, and purple tags indicate the core entropy index 
(Ko in keV cm?) at this temperature. An error bar near the upper right corner 
shows the typical uncertainty range (2 s.d.) for t,o, which comes primarily 
from the statistical uncertainty in gas temperatures derived from Chandra 
X-ray spectroscopy. 
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Figure 2 | Hot-gas cooling time as a function of radius in galaxy clusters of 
differing temperatures. All lines are colour-coded as in Fig. 1. When grouped 
by temperature, all of the Ho.-emitting clusters have profiles that dip below 
the locus of conductive balance, while only three of the no-Ha clusters dip 
below it. None of those three enters the pink zone corresponding to the toooi/tir 
excursions seen in simulations of precipitation-driven feedback, suggesting 
that the three clusters may be objects in which precipitation has not yet begun. 
In the yellow regions, our model predicts that thermal conduction should be 
heating gas and driving it to the isothermal-core state. If thermal conduction is 
indeed responsible for separating the t,o, profiles of Ha and no-H« clusters, 
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then the degree of separation should increase with temperature. The main effect 
of increasing temperature is to drive the locus of conductive balance closer to 
the precipitation threshold, narrowing the range of t.o.1/f within which 
multiphase gas can persist. This trend appears to be present in the data but with 
marginal statistical significance. For Hu-emitting clusters in the 2-7 keV range, 
we find that the mean value of min[t,,,1/t#] is 20.9 + 1.7 with a standard 
deviation of 9.5. Among Ho.-emitting clusters in the 7-15 keV range, both the 
average value of min[t-oo1/ty] and the dispersion are lower, with a mean of 
15.7 + 1.7 and a standard deviation of 5.6. 
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Figure 3 | Evolution of radial gas density and cooling-time profiles in galaxy 
clusters. a, Evolution of electron density n,. Objects at mean cosmological 
redshift <z> = 0.12 are from the Chandra Cluster Cosmology Project”, and 
objects at <z> = 0.5 and 1.0 are from the South Pole Telescope” survey. 
Cosmological scaling has been removed through division of r by rs99 and 
division of n, by pafomp ', where p, is the cosmological critical density and f, 
is the fraction of cosmic mass in baryonic form. Thin lines show cluster 
observations. Solid thick magenta and brown lines show the precipitation limit 
and baseline profile, respectively, corresponding to a reference temperature 
of 6 keV. Dashed lines show the precipitation and baseline profiles for the 
low-redshift subsample at <z> = 0.12. The gas-density contrast between a 
core near the precipitation limit and the outer part of the baseline profile 


too high enough for conduction to eliminate the multiphase gas. Those 
cluster cores should cool and return to active precipitation within a few 
hundred million years. 

Subdividing clusters according to temperature strengthens the case 
for thermal conduction (Fig. 2). Our framework predicts that clusters 
with cooling-time profiles between the conductive-balance locus and 
the isothermal-core profile should be rare, because thermal conduction 
should be driving ft...) from tiona(r) towards ti,.(7). The data show that 
the zone between f,onq(r) and tj,,(r) is indeed systematically depopu- 
lated and suggest that it grows larger with increasing temperature, in 
accord with the strong temperature dependence of thermal conduction 
in astrophysical plasmas. We note also that in every temperature range, 
the lower edge of the t.4o1 envelope closely follows the joint precipita- 
tion + baseline profile, including the kink at the intersection point, show- 
ing that the floor at t,o.) ~ 10tg is present in data across the entire cluster 
temperature range. 

Two predictions for the evolution of galaxy-cluster cores follow from 
these considerations. First, the thermodynamic properties of precip- 
itating cores should remain relatively constant with time, because they 
are determined by local conditions and not by cosmological evolution. 
Second, the contrast in gas density between a precipitating core and the 
outer parts ofa cluster should grow more pronounced with time, because 
hierarchical structure formation causes the baseline profile to become 
less dense as dynamical heating resulting from mergers adds entropy 
to the gas and shifts t.... upward. X-ray observations of the South Pole 
Telescope galaxy-cluster sample** support these predictions (Fig. 3). 
The limits on central density, entropy, and cooling time of high-redshift 
clusters remain similar to those for low-redshift clusters and do not 


decreases with increasing redshift. This happens because the Universe as a 
whole is denser at earlier times, whereas gas density at the precipitation limit 
remains nearly constant because it is set by local conditions. b, Evolution of 
hot-gas cooling time. All line styles are identical to those in a. In this unscaled 
representation of the same data, the precipitation limit remains nearly constant, 
while the baseline profile shifts downward with increasing redshift because 
the mean gas density is increasing. Error bars in both panels show a statistical 
uncertainty range equivalent to 2 s.d. One South Pole Telescope cluster in 
the <z> = 0.5 set, shown with a red line, crosses the precipitation limit. 
Notably, it is the Phoenix cluster*’, which has, by far, the largest central 
star-formation rate of all known galaxy clusters. 


violate the precipitation limit, whereas the outer parts remain limited 
by the baseline profile, which is at progressively greater density, lower 
entropy, and shorter cooling time as cluster redshift increases. 
Taken as a whole, this many-faceted correspondence between models 
and data convincingly shows that we now understand what regulates 
cooling and star formation in the Universe’s largest galaxies and raises 
an even bigger question. How far down the galaxy-mass spectrum do 
these principles extend? Precipitation is likely to be a very general fea- 
ture of galaxy evolution, in that precipitation-driven feedback owing 
to both star formation and accretion onto black holes is likely to main- 
tain the ambient circumgalactic medium of a star-forming galaxy in a 
state with t.,.o)/tg~ 10. Conversely, galaxies embedded in ambient gas 
with tooo1/t >> 10 have no way of replenishing the cold gas required for 
star formation, which therefore wanes. Thermal conduction is prob- 
ably less general, given its strong temperature dependence, but stellar 
heating mechanisms such as supernova explosions should be of greater 
relative importance in lower-temperature systems and may provide an 
analogous upper bound on residual precipitation that separates star- 
forming galaxies from those in which star formation has ceased. 
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Ongoing hydrothermal activities within Enceladus 
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Detection of sodium-salt-rich ice grains emitted from the plume of 
the Saturnian moon Enceladus suggests that the grains formed as fro- 
zen droplets from a liquid water reservoir that is, or has been, in con- 
tact with rock'”. Gravitational field measurements suggest a regional 
south polar subsurface ocean of about 10 kilometres thickness lo- 
cated beneath an ice crust 30 to 40 kilometres thick’. These findings 
imply rock-water interactions in regions surrounding the core of 
Enceladus. The resulting chemical ‘footprints’ are expected to be pre- 
served in the liquid and subsequently transported upwards to the 
near-surface plume sources, where they eventually would be ejected 
and could be measured by a spacecraft*. Here we report an analysis 
of silicon-rich, nanometre-sized dust particles** (so-called stream 
particles) that stand out from the water-ice-dominated objects char- 
acteristic of Saturn. We interpret these grains as nanometre-sized 
SiO, (silica) particles, initially embedded in icy grains emitted from 
Enceladus’ subsurface waters and released by sputter erosion in Sat- 
urn’s E ring. The composition and the limited size range (2 to 8 
nanometres in radius) of stream particles indicate ongoing high- 
temperature (>90 °C) hydrothermal reactions associated with glo- 
bal-scale geothermal activity that quickly transports hydrothermal 
products from the ocean floor at a depth of at least 40 kilometres up 
to the plume of Enceladus. 

Dust dynamics provide diagnostic information about the origin of 
the observed dust populations. The dynamical properties of Saturnian 
stream particles show characteristics inherited from Saturn’s diffuse E 
ring’. Considering the long-term evolution of the E ring and dust-plasma 
interactions, our dynamical analysis reproduces the observed charac- 
teristics, confirming their E-ring origin (Methods). Enceladus is the 
source of the E ring and hence the ultimate source of stream particles, 
allowing Enceladus to be probed using stream particle measurements. 

Co-added mass spectra of selected Saturnian stream particles detected 
by Cassini’s Cosmic Dust Analyser (CDA)? (Fig. 1) show silicon as the 
only highly significant particle constituent. Oxygen is the other abund- 
ant possible particle mass line but is also a minor but frequent target 
contaminant”®. The contribution of particle material to the oxygen sig- 
nal is difficult to assess, but its intensity is in agreement with at least a 
fractional contribution from silicates (Methods). Remarkably, only traces 
(at most) of metals are found to contribute to the particle composition, 
indicating that the stream particle spectra are not in agreement with 
those of typical rock-forming silicate minerals (that is, olivine or py- 
roxene). The data are in agreement solely with extremely metal-poor 
(or metal-free) silicon-bearing compounds, of which, besides elemental 
Si, only SiO, and SiC are of cosmochemical relevance''. Considering 
that Siand SiC are highly unlikely to be emitted in significant quantities 
froma planetary body, we conclude that the dominant, if not sole, con- 
stituent of most stream particles must therefore be SiO2. Quantitative 
mass spectra analysis indicates a radius of fax = 6-9 nm for the largest 
stream particles (Methods). This is in excellent agreement with the upper 


particle size limit independently inferred from dynamical simulations 
(Tmax ~8 nm)’. 

The spontaneous, homogeneous nucleation of nanometre-sized col- 
loidal silica is a unique property of the silica-water system. We consider 
this as the production mechanism of the observed silica nanoparticles 
because of (1) the existence of a subsurface ocean in contact with rock 
and (2) the improbability of homogeneous fragmentation of pure bulk 
silica into particles with radii exclusively below 10 nm within Enceladus. 
Only a rock-related, bottom-up formation process is plausible. Col- 
loidal silica nanoparticles form with initial radii of 1-1.5 nm when the 
solution becomes supersaturated’’. In moderately alkaline solutions 
(pH 7.5-10.5) with low electrolyte concentration, the charge state of silica 
nuclei allows colloidal silica nanoparticles to nucleate and grow by addi- 
tion of dissolved silica as well as by Ostwald ripening'*’’. Above about 
pH 10.5, silica solubility becomes too high to maintain a stable colloidal 
phase’’. Laboratory experiments show that after hours to days in a su- 
persaturated solution with a slightly alkaline pH and at various ionic 
strengths, colloidal silica grows to radii of 2-6 nm (refs 14-17), which is 
in good agreement with CDA measurements. 

Both measurements—mass spectra and the narrow size distribution— 
indicate silica nanoparticles but may not provide unequivocal proof 


T1117 
ie) 
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Figure 1 | Identifying particle constituents. Shown is a co-added impact 
ionization mass spectrum from 32 selected Saturnian stream particle spectra 
with the strongest Si* signals. As expected, the impacts produce more ions 
from the CDA’s target material (Rh’ and Rh, ; blue areas) and the target 
contaminants®!° (C‘, H*; blue areas, H* not shown) than from the 
nanoparticle itself. Ions O* and Si* are the most abundant potential particle 
mass lines. Na*/Mg" (solidus indicates the two species can not be 
distinguished) form the only other potential particle mass line with a signal-to- 
noise ratio above 3a (dashed line; o, standard deviation). The particle 
composition agrees best with pure silica when the target impurities and the 
impact ionization process are taken into account (Methods). 
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Figure 2 | Minimum temperatures for formation of silica nanoparticles. 

a, Solid lines show XSiO, of a serpentine-talc/saponite buffer equilibrium as a 
function of temperature (x axis) and pH (line colour: see key above). This 
buffer system is consistent with the measured XSiO, in fluid samples of the 
hydrothermal experiments using an orthopyroxene and olivine powder 
mixture at 400-bar pressure (filled black circles annotated with in situ pH 
values; Methods). Dashed lines show the 0 °C silica solubility at the respective 
pH. The difference between the solid and dashed lines determines the amount 
of ZSiO, available for silica nanoparticle formation at the respective pH. 
Insets, images of silica nanoparticles formed in cooled solutions. 

b, Relationships between minimum hydrothermal fluid temperatures and 
fluid pH for silica nanoparticle formation. Red and blue colours represent 
results with increasing and fixed pH, respectively, upon cooling and 

mixing with seawater. Data points show results for Na* concentration 

0.1 molkg | and pressure 30 bar; shaded areas represent the uncertainties 

in Na™ concentrations (0.05-0.3 mol kg’) and pressure (10-80 bar; ref. 3). 


individually. However, nanosized silica remains as the only plausible 
interpretation of the stream particle measurements when results from 
these two independent analysis methods are incorporated. Moreover, 
the relation between the stability of silica nanoparticles and solvent 
alkalinity matches the pH range of the liquid plume source(s) (about 
8.5-9), as independently inferred from the composition of emitted 
salt-rich ice grains’. 

We can now use silica nanoparticles as a thermometer for the sub- 
surface ocean floor of Enceladus, assuming that such particles form 
owing to SiO solubility reduction during a temperature reduction in 
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cooling water’””*. This is the most common way for silica nanoparti- 
cles to form on Earth, and is frequently observed in alkaline hydro- 
thermal fluids'*’”""’. To determine the relation of silica concentration 
versus solution temperature applicable to Enceladus, long-term rock- 
water interaction experiments were conducted. A pressurized solution 
of NaHCO; and NH; in water was brought into contact with powdered 
primordial minerals (70% olivine and 30% pyroxene) at various tem- 
peratures and for several months (Methods). Hydrothermal alterations 
produced secondary minerals typically found in carbonaceous chon- 
drites, including serpentine, talc/saponite and magnetite. Our experi- 
mental results (Fig. 2a) show that the total SiO, concentration in fluids 
(ZSiO2 = SiO2(aq.) + HSiO; + NaSiO;(aq.)) in contact with these sec- 
ondary minerals is controlled by a serpentine-talc/saponite buffer sys- 
tem: that is, serpentine + 2SiO,(aq.) © talc/saponite + H,O. This allows 
us to calculate the minimum temperature required for silica nanopar- 
ticle formation on cooling of the hydrothermal fluids—that is, the reac- 
tion temperature at which ZSiO, exceeds the solubility of amorphous 
silica at 0 °C for a given pH. Assuming the fluid pH remains constant 
on cooling, the reaction temperature must reach ~90 °C at pH 10.5, or 
a higher temperature if the fluid pH is below 10.5 (Fig. 2b). Because silica 
solubility increases with fluid alkalinity, the minimum temperature 
allowing silica nanoparticle formation on subsequent cooling rises to 
~190 °C at pH 10.5 if the hydrothermal fluid pH were to increase by 
one when mixing with the subsurface ocean water (Methods and ref. 20). 

It is not clear how steep the temperature gradient across the subsur- 
face ocean is. However, the ocean is most likely to be convective if the 
minimum temperature allowing silica nanoparticle formation on sub- 
sequent cooling (that is, > 90 °C) at the rock—water interface is achieved. 
We believe that most silica nucleation and initial growth would occur 
when the hydrothermal fluids reach the relatively cold ocean water at 
the ocean floor. The growth of silica nanoparticles may continue as the 
hydrothermal fluids ascend (Fig. 3). 

For comparison, the average concentration of silica nanoparticles in 
their icy E-ring ‘carrier grains’ can be estimated using the measured and 
modelled stream particle production rate (Fig. 4 and Methods). Albeit 
with large uncertainties, a conservative lower limit still requires the for- 
mation of 150 p.p.m. of silica nanoparticles, equivalent to a solution su- 
persaturated by about 2.5mM SiO, which was available to form the 
observed nanoparticles. Such a high nanosilica abundance requires a 
high temperature gradient at a pH ofat least 8.5, and cannot be explained 
solely by incorporation of dissolved silica on freezing of water droplets 
in the vents”. The high abundance and specific sizes of stream particles 
both indicate that they existed in colloidal form before their integration 
into ice grains. 

The existence of silica nanoparticles also provides strict constraints 
on the salinity of Enceladus’ subsurface waters because silica colloids 
aggregate and precipitate quickly at high ionic strength’*"*. The critical 
coagulation concentration of NaCl at pH 9 is 2% or ~0.3 M (1.5% or 
~0.2 M at pH 10, 4% or ~0.6 M at pH 8)”. This sets an upper salinity 
limit of about 4% for the location where silica nanoparticles form at 
depth, as well as for the near-surface plume sources, and corresponds 
to the lower salinity limit of 0.5% derived earlier’. Partial freezing of the 
water would increase the salinity and would result in immediate silica 
precipitation”’, suggesting that the observed silica nanoparticles have 
never ‘seen’ a brine. This also implies that the observed silica nanopar- 
ticles were produced during the present active phase of Enceladus. 

The growth of colloidal particles sets another constraint on the life- 
time of the silica nanoparticles. For example, through Ostwald ripening”, 
nanosilica would grow to micrometre-sized grains within a few thou- 
sand years or less (Methods). The observed radii, below 10 nm, imply 
the continuous and relatively fast upward transportation of hydrother- 
mal products (see, for example, ref. 22), from ongoing hydrothermal 
activities in the subsurface ocean to the plume sources close to the sur- 
face, over months to several years at most (Methods). 

Our results show that two very different dust populations detected by 
Cassini—that is, micrometre-sized ice grains'****** and nanometre-sized 
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Figure 3 | A schematic of Enceladus’ interior. The internal structure and 
conditions of Enceladus beneath its south polar region derived from this and 
previous work. The main components (core, subsurface ocean, ice crust and 
plume) are shown left to right; top row gives temperature and chemical 


silica stream particles—in fact have the same origin but probe the con- 
ditions of the subsurface water of Enceladus at different depths: the 
silica nanoparticles probe the pH, salinity and water temperature at the 
bottom of Enceladus’ ocean, while the micrometre-sized ice dust grains 
reveal composition and thermal dynamical processes at near-surface 
liquid plume sources and in the vents'*”* (Fig. 3). The current plume 
activity is probably not superficial but a large, core-to-surface-scale pro- 
cess. The low core densities implied by Cassini’s gravitational field mea- 
surements’ as well as the low pressure of the mantle resting on the core” 
are in good agreement with a porous core. This would allow water to 
percolate through it, providing a huge surface area for rock—water in- 
teractions, and the high temperatures (>90 °C) implied by our obser- 
vations might occur deep inside Enceladus’ core. 
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Figure 4 | Concentration of silica nanoparticles in E-ring grains. The mass 
fraction of silica nanoparticles in E-ring ice grains is estimated by comparing 
the production rates derived from the dynamical model (sloping red lines) 
and CDA measurements (blue horizontal line and shaded region). We assume 
that the stream particle release rate is directly proportional to the E-ring 
sputtering erosion rate. The steeper the power-law size distribution slope (1), 
the larger the total surface area of E-ring grains and thus the higher the 
production rate of silica nanoparticles. The lower limit for the nanosilica mass 
fraction is ~150 p.p.m. (equivalent to 2.5 mM shown in the lower x axis) 
with p= 5.4 (yellow dashed line)”. 
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properties of each component, middle row shows schematic structure, and 
bottom row gives physical properties. Distances labelling the grey line below the 
middle row are distances from the centre of Enceladus towards its south pole 
(not to scale). 
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METHODS 

Dynamics analysis. The dynamical properties derived from nanodust-solar wind 
interactions (ejection speed: 50-200 km sl, charge-to-mass ratio: 1,000-20,000 C kg ~ i 
or 2-8 nm in radius assuming +5 V surface potential) links stream particles to an 
ejection region (ER) at ~8 Rg (Rg is the Saturn radius, 60,268 km) from Saturn’. The 
ER is defined as the region where charged nanoparticles start to gain energy from 
the co-rotation electric field to escape from the gravity of Saturn. Considering the 
energy conservation, the ER distribution thus represents the distribution of stream 
particles’ dynamical properties as they are ejected from the system (equation (5) in 
ref. 7) and reflects the effects of the dust charging process as well as the source lo- 
cation’. The ER peak location indicates that their source extends from the inner sys- 
tem to over 8 Rg with strength decreasing outward’. It was therefore proposed that 
stream particles are nanometre-sized Si-rich inclusions in E-ring ice grains released 
through plasma sputtering erosion’, as such sputtering is more corrosive on water 
ice than on Si-bearing minerals’*. Enceladus is the dominant source of E-ring ice 
grains”’, suggesting that stream particles also originated from Enceladus. 

One major difference between the E ring and other tentative sources (for example, 

Saturn’s main rings and moons) is the vertical extension. E-ring grains can obtain 
significant inclination because of solar radiation pressure as well as gravitational 
perturbations from embedded moons™”*”’. Nanoparticles released from the E ring 
would inherit the inclination, as the magnetic field of Saturn aligns well with its 
rotation axis”. To examine the proposed hypothesis, we adopt numerical simu- 
lations to reconstruct the emission patterns of the Saturnian stream particles, as 
described below: 
The sputtering mass loss rate of E-ring grains. Trajectories of ice grains with initial 
radius, r, between 0.1 and 5 jum from the dynamics model” are used to reconstruct 
the E-ring profile. E-ring grains follow a power-law size distribution, n(r) o (r/ 
ro) “, where r is the grain radius and ranges from 4.8 to 5.4 (ref. 24). Weighted 
with the initial size distribution and normalized to the dust density recorded at the 
orbit of Enceladus”, the simulated trajectories are binned to a two-dimensional, 
axially symmetric dust density map (Extended Data Fig. 1a). 

The dust size distribution and the mean dust-plasma relative speed are used to 
calculate the sputtering mass loss rate of E-ring grains at a given torus segment (p, z), 
expressed by 


in(p,z) = | Peput(p.Z) X A(r,p,Z) X My, odr 


where A(r,p,z) is the surface area of E-ring grains with radius rin a given torus seg- 
ment, p and z are the distance to the rotation axis of Saturn and to the ring plane, 
respectively, and my,o is the mass of a water molecule. The sputtering yield (,,) 
of icy surface in Saturn’s magnetosphere is governed by the elastic nuclear collisions 
from the thermal magnetospheric plasma ions**** and can be written (equation (4) 
in ref. 33) as: 


; U(p.z) x ni(p.2) 


Psput(p.Z) =f. 4 x Y(E;,0), 


x~0.3+0.13 In(m;) 


where u(p, z) is the relative speed between E-ring grains and the magnetospheric 
plasma ions, n,(p, z) is the plasma ion density, and m;=my,o0 We use an ad hoc 
plasma model built based on the Cassini measurements’. 0,,, = 80° is the ion inci- 
dent angle beyond which the sputtering yield rapidly decreases*’. Y(E;, 0) is the 
plasma ion sputtering yield of water ice****. The resulting E-ring mass loss rate is 
shown in Extended Data Fig. 1b. 

Stream particle production rate. This is defined as the amount of escaping nano- 
silica particles per unit time. Under the assumption that the stream particle produc- 
tion is proportional to the E-ring ice mass loss rate (1), the production rate (w) is 
written as: 


W(Tspsp.2) = (Tsp.P22) forfeit (1) 
m(p,Z 
w (tsp.P.Z) = TEX Poa) Po Po) (2) 


where f,, is the mass ratio of silica nanoparticle with respect to the water ice in 
E-ring grains. fog is the efficiency of nanosilica release via the plasma sputtering 
process, which depends on the location distribution of nanosilica particles within 
the ice grain as well as the efficiency of plasma sputtering erosion processes. w’, r.p, 
Mp, Pmass> Peject are the normalized production rate, the radius, the mass, the mass 
distribution function, and the ejection probability of nanosilica stream particles, 
respectively. Based on the derived size range’, we assume that stream particles 


LETTER 


follow a Gaussian distribution with a mean at 4nm and variance of 2 nm. Peject is 
calculated from the nanodust ejection model described below. The normalized 
production rate of 5 nm silica particles is shown in Extended Data Fig. 1c. 
Dynamical evolution of charged nanoparticles. The predominant acceleration of 
charged nanoparticles in Saturn’s magnetosphere stems from the outward-pointing 
co-rotation electric field’’*”°. In the first order, only positively charged dust part- 
icles gain energy and escape. Therefore, the fate of nanoparticles depends on the 
charging processes, that is, the plasma conditions at the location where they are 
released. Using the plasma model described previously, the ejection probability map 
of nanodust particles is simulated. See ref. 7 for the modelling details of the sto- 
chastic charging process and the equation of motion of nanoparticles. 

Extended Data Fig. 2 shows the P.ject maps for 5 nm silica and water ice particles. 
A successful ejection event is defined as when the required ejection time of a test 
particle is less than half of its sputtering lifetime. The sputtering yield of water ice is 
about an order of magnitude larger than that of silicates (for example, ®i-. + 1.5 and 
Psiicate ~ 0.15 for incident He ions at 500-1,000 eV energy range”’). We assume 
that the sputtering lifetime for silica is ten times longer than that of water ice. The 
particle size decrease due to sputtering is not considered in the simulation’. 

The emission patterns. The dynamical properties of charged test nanoparticles re- 
leased from the E ring simulated in the above step are converted to ER (equation (5) 
in ref. 7)) and the latitudinal emission pattern, weighted by the normalized pro- 
duction rate (equation (2)) according to their initial location, as shown in Extended 
Data Fig. 3a, b. We also modelled the emission patterns assuming that nanosilica 
particles are ejected directly from Enceladus, to examine our hypothesis (Extended 
Data Fig. 3c, d). 

The nanosilica colloid concentration. f,, in equation (1) can be determined by com- 
paring the modelled stream particle production rate (w) with the CDA stream par- 
ticle flux measurements, as shown in Fig. 4. The CDA observations are summarized 
in Extended Data Table 1. We assume that (1) f,, and fog remain constant through 
the E-ring grains lifetime, and (2) nanosilica particles are mixed homogenously in 
the ice matrix of E-ring grains (that is, f.¢ = 1) so their release is directly propor- 
tional to the sputtering erosion rate. Figure 4 shows that the derived f,, ranges from 
about 150 to 3,900 p.p.m. (parts per million), depending on the adopted E-ring size 
distribution slope. The conservative lower limit of the dissolved silica concentration 
at the reaction sites is about 210 p.p.m. (3.5 mM), including the silica solubility at 
0°C (~1 mM, or 60 p.p.m.). This corresponds to minimum reaction temperatures 
of 250°C and 120°C for solution pH values of 9 and 10, respectively (Fig. 2a). 
Spectra analysis. Data set. The Cassini CDA measures the composition of indi- 
vidual dust grains by time of flight (TOF) mass spectroscopy’. Owing to the small 
mass of stream particles, their impact ionization spectra provide only weak particle 
mass lines at best”. In previous investigations®” only Si* at 28u (u = unified atomic 
mass unit) could be identified as a definite particle constituent. Here we aim to go 
to the absolute detection limit possible with CDA. The goal is to quantify the most 
prominent elemental stream particle constituent, silicon®’, and to identify other 
elements that are typically abundant in silicate minerals (for example, magnesium 
or iron). Therefore only spectra with the best particle signals recorded between 
April 2004 and January 2008 were used for this analysis. The main reason to choose 
this period is that it provides the highest quality CDA spectra with the lowest pos- 
sible contamination background. Starting in March 2008, CDA was frequently oper- 
ating deep inside Enceladus’ plume, during which time the refractory constituents 
of Enceladian ice grains, for example, sodium and potassium salts, might have 
accumulated and enhanced the CDA target contamination. 

From the data set of over 2,000 stream particle spectra, 32 spectra with the highest 
signal-to-noise ratio of a 28u (+0.6u) signal were selected. A Si” signal amplitude of 
0.7 LV was chosen as the selection threshold. This value provides clear Si* signals 
as well as a sufficiently large ensemble of spectra. These impact-ionization spectra 
also show relatively high total ion production (the sum of ions stemming from 
target material, target contamination and the particle itself). Thus, the selected spec- 
tra probably represent the largest detected stream particles at the highest encounter 
speeds during the observation time. 

The selected spectra probably show the highest abundance of particle material 
(compared to target material and target contamination) and thus provide the high- 
est probability of detecting further elemental particle constituents. Note that even in 
these spectra, ions from particle compounds only amount to about 1%. To further 
enhance the signal-to-noise ratio, the spectra were co-added and ‘Lee’ filtered (Fig. 1). 
Other exemplary spectra of stream particles can be found elsewhere®”’. 

Spectra interpretation. The selected impacts most probably occurred at speeds above 
200 kms" '. In this regime, the energy density is orders of magnitude higher than the 
molecular bond energies****. Therefore, similarly to the case of Jovian stream par- 
ticles*’, only elemental ions are produced upon impact. However, subsequent clus- 
tering by collisions of neutral and ionized elements in the impact cloud (before the 
ions ‘feel’ the accelerating potential of the TOF spectrometer) can produce two- 
component ions”. In the case of the data set used for this work, this clustering 
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phenomenon is responsible for the formation of bi-elemental cations from the 
target material rhodium (Rh>*) (Fig. 1). The ratio of Rh*/Rh," is about 100. Since 
rhodium is probably the most abundant constituent of the impact cloud, the in- 
tensity of this low-level signature marks the upper limit for the abundance of other 
non-elemental ions. This also helps to resolve the notorious ambiguity of the 28u 
signature in mass spectrometry that, besides silicon, can in principle be assigned to 
cations of No, CO, CNH2 and C,H. Carbon and hydrogen are highly abundant 
spectrum contaminants from the instrument target, these elements thus cannot be 
assessed with respect to the composition of stream particles’®. Although all these 
components could potentially contribute to the 28u mass line, their abundance can 
be expected to be very low at most. 

From integration of the spectral peaks, abundances and ratios of cationic species 
in the impact cloud can be directly derived. Ionization probabilities of the different 
species have to be considered to form conclusion regarding the composition of the 
particle. This is of particular relevance to reaching a conclusion about the metal to 
silicon ratio in stream particles, one of the main goals of the spectra analysis. All 
metals, especially Mg/Ca and Na/K, have a higher probability of forming cations 
than silicon**. The highest possible metal signal in the spectrum shown in Fig. 1 is a 
peak at an atomic mass of 23-24 u in agreement with sodium (Na*) and/or mag- 
nesium (Mg*; the adjacent mass lines can not be distinguished here), which is about 
5 times less abundant than the Si* signature. Two regions with mass lines that can 
be attributed to K*/Ca* at 39-40 u and unspecified species (at ~3.6 jis, or 56- 
60 u) are of weak significance, indicating even lower abundances if considered as 
particle constituents. In contrast to these metals, silicon is not completely converted 
from elements to cations. It has a higher ionization potential and higher electron 
affinity, which lead to simultaneous formation of anions, cations and neutrals in 
the impact cloud. Laboratory calibrations imply the cationization probability of Si 
to be about 3 times lower than that of Mg”. In total, Fig. 1 implies a metal to silicon 
ratio below 1/10. This ratio of the most metal-depleted silicates is 2/3 and ranges 
from 1 to 2 for most rock forming minerals. It is possible that traces of Na and K 
have been transferred to the surface of nanosilica particles from remains of salt-rich 
carrier ice grains causing the weak signatures at mass 23u and 39u. We note that the 
observed possible metal signatures are upper limits for the particle constituents, as 
the CDA target is known to have a low-level contamination of Na and K (ref. 10). A 
bi-elemental cluster (C2 *, 24u), formed from the highly abundant carbon contam- 
ination, might also significantly contribute to the signal at mass 23-24. Consequently 
it is possible that the potential weak metal mass lines stem entirely from contam- 
ination, and that stream particles are entirely metal-free. To summarize, while we 
cannot completely rule out that some of the weak signatures have contributions 
from metal ions stemming from the particle, their abundance is far too low to be in 
agreement with a rock-forming silicate. 

In Fig. 1 the O*/Si* ratio is about 2. However, in contrast to metals, oxygen has 

a lower probability of forming cations than does silicon. Therefore, the O*/ Si* 
ratio should be clearly below 2 for a pure silica (SiO) particle. From laboratory cal- 
ibration we expect it to be around 1. But as oxygen is known to be a target contam- 
inant that contributes to the O* mass line to an unknown extent!®, the observed 
ratio is consistent with SiO. 
Stream particle size estimate by integration of the Si* signal. By integrating the 
strongest Si* signals, the number of Si* ions created by the impinging particle can 
be calculated. The idea that the impact ionizes all Si atoms is a simplification, but in 
the case of ultra-fast stream particles it is probably sufficiently accurate to infer a 
meaningful lower limit for the number of Siatoms in the particle. This in turn allows 
for mass and size calculation, again a lower limit, if stream particles are assumed to 
consist solely of SiO,. 

The integrated signal of the Si* peak in Fig. 1 is equivalent to about 1,500 ions. 
As explained above, this signal probably stems from the largest measured stream 
particles at the highest encounter speed (>200kms°')*°. The ions recorded in 
CDA mass spectra represent about 1/6.5 of the ions that were initially formed’. We 
conclude that the largest stream particles created about 10,000 Si cations upon 
impact. 

Under the assumption of a pure spherical SiO, particle, we can now calculate a 
size from this number. If we want to derive a strict lower limit on the largest par- 
ticle size, we have to assume a grain built of about 10,000 SiO, molecules, which 
leads to a particle radius of about 6 nm, if we assume a density of 2,200 kgm? for 
amorphous silica. As mentioned above, it is highly probable that only a fraction of 
silicon is converted into cations even at the extreme impact speed of stream part- 
icles. A more realistic assumption is that only a third of Si atoms form cations, which 
gives a maximum particle radius of about 8.5 nm (for comparison, the largest Jovian 
stream particles reach radii of over 20 nm; refs 8, 37). 

Hydrothermal experiments and calculations. We performed hydrothermal exper- 
iments based on the methodology and apparatus employed in previous studies”. 
The starting mineral powder and solution were introduced into a flexible gold reac- 
tion cell, pressurized to 400 bar with a steel alloy autoclave*". The pressure condition 


corresponds to that of Enceladus’ rocky core (~150 km below the water-rock in- 
terface). The effect of pressure is not critical for estimating the temperature required 
for nanosilica formation. This is because the silica concentration equilibrated by the 
serpentine-talc/saponite buffer is not sensitive to pressure range within the core 
(~100-500 bar)**. The flexible gold reaction cell consists ofa gold reaction bag and 
a titanium head"!, which was oxidized before use. The flexible reaction cell allows 
us to perform an on-line collection of fluid samples during the experiments". 
See ref. 41 for more details. 

As starting minerals, we used a mixture of powdered olivine (San Carlos Olivine: 
Mg} sFeo 2SiO4) and orthopyroxene (MgSiO3) (orthopyroxene: olivine = 7: 3 by 
weight; 15 g in total). These are major minerals known to be abundant in asteroids 
and comets***®. San Carlos olivine contains trace amounts of spinel and pyroxene, 
which were the source of Al, Caand other elements. We synthesized orthopyroxene 
crystals using the flux method’. The starting solution (~ 60 g) was an aqueous 
solution of NH; (1.1 mol per kg H,O) and NaHCO; (360 mmol per kg HO). 

We conducted two experiments at different temperatures. One was performed at 
a constant temperature of 300 °C for ~2,700 h of reaction time. In the other exper- 
iment, the temperature was set to 120 °C for an initial ~1,700 h of reaction time, 
and then increased to 200 °C (~2,300 h of reaction time in total). We measured the 
concentrations of dissolved silica and other major elements (for example, Na, Mg, 
Fe, Ca, Aland K) dissolved in the fluid samples during the reaction time with induc- 
tively coupled plasma atomic emission spectroscopy (Perkin Elmer). Mineralogical 
analyses for the solids collected after the experiments were performed with an 
X-ray diffraction spectrometer (X PERT-PRO PANalytical). The in situ pH of the 
solution in the experiments was calculated using the measured pH of the fluid sam- 
ples at room temperature and concentrations of dissolved gas and elements. The 
in situ pH values are calculated as 8.4-8.8, whereas measured pH values at room 
temperature were 10.1-10.2 at the end of the experiments. 

The XSiO2 concentration determined by chemical equilibrium between ser- 
pentine and talc/saponite was calculated with equilibrium constants computed 
with the SUPCRT92 program”. Given the similarity in the chemical compositions 
between talc and saponite, we used the thermodynamic data of talc in the calcula- 
tions. The solubility of silica at 0 °C was calculated from thermodynamic data of 
amorphous silica. The concentrations of HSiO; and NaHSiO;(aq.) were calculated 
for different pH values and at 0.1 mol per kg Na“ concentration using the equi- 
librium constants of the following reactions: SiO,(aq.) + H.0 <> HSiO; + Ht 
and HSiO;_ + Na* < NaHSiO;(aq.). 

We observed silica nanoparticle formation by cooling the fluid samples collected 
in the experiments at 300 °C. A part of the sample was cooled at ~0 °C in an ice 
bath, and then dialysis treatment (that is, fluid removal) was performed for several 
minutes. After the dialysis, a drop of the sample was mounted on a slide, and the 
excess solution was wicked away with tissue paper. Microscopic observations of the 
slide were performed with a field emission scanning electron microprobe (FE-SEM). 
Individual and clustered silica nanoparticles were observed (Fig. 2). The typical size 
of individual particles was ~5-20 nm in diameter. The energy dispersive spectrum 
indicates that they are composed mainly of Si and O with trace amounts of Na and 
Ca (Extended Data Fig. 4), which may be adsorbed on the surface of particles. 
Timescale of growth of nanosilica in Enceladus’ ocean. We estimated this on the 
basis of the equation shown in the previous study”. The primary size of nanosilica 
formed from alkaline aqueous solution is a few nanometres in radius'*'*""”. After 
the formation of these nanosilica particles, the size would increase slowly by pre- 
cipitation of dissolved silica onto the surface (Ostwald ripening)’*. The timescale of 
growth, f,, of radius from r, to r by Ostwald ripening in pure water can be described 
as follows (equation (14) in ref. 21); 


tg=1°So(To)/RorsSo(T) 


where Rp is the dissolution rate of silica, and So(T) and So(To) are the solubility of 
silica at a given temperature, T, and that at the temperature, To, where the experi- 
mental data were obtained (25 °C), respectively. According to the previous study”, 
Ro for amorphous silica is 8.8 X 10° '° cms! at 0°C. The ratio of So(Ty)/So(T) is 
calculated as 1.67 for T = 0°C (ref. 21). Thus, the timescale for a 2-nm-sized nano- 
silica particle to grow to an 8-nm-sized particle at 0 °C in pure water is estimated as 
~20 years. If NaCl is included in the solution, the Ostwald ripening proceeds more 
rapidly, about 10-100 times faster than in pure water*’. Thus, the nanosilica par- 
ticles with radius of =8 nm, observed by Cassini CDA, should have been formed 
within months to years, suggesting continuing hydrothermal activity in Enceladus. 
Enceladus’ silicon footprint in Saturn’s magnetosphere. After being transported 
to the near-surface plume sources, nanosilica particles eventually become grain in- 
clusions in frozen water droplets'*”? from spray above Enceladus’ subsurface liquid 
plume sources—or they may entrain in the gas flow and serve as condensation seeds 
in the vent. After entering the E ring they are exposed to Saturn’s magnetosphere, 
separated from ice grains by differential plasma sputtering erosion and eventually 
ejected into interplanetary space as stream particles. 
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About 1 mM (60 p.p.m.) of silica might in fact still be dissolved in liquid Enceladus 
plume sources at 0 °C and become an additional ice grain constituent upon freez- 
ing. After sputtering erosion and ionization, this component, as well as erosion of 
nanosilica particles, contributes to the mass 28 ions observed by the Cassini plasma 
instruments CAPS (Cassini Plasma Spectrometer)” and MIMI (Magnetospheric 
Imaging Instrument)” at different energies. 

Analysis of CAPS ion measurements” shows that the density ratio between the 
mass 28 and water group ions is about 6 X 10-°, which corresponds toa mass frac- 
tion of ~90 p.p.m. and interestingly is comparable to silica solubility at 0 °C (50 and 
120 p.p.m. for pH = 9 and 10, respectively). The mass resolution of Cassini instru- 
ments cannot distinguish between HCNH*, CO*, N* or Si*, and therefore no 
solid conclusion can be drawn for the origin of the mass 28 ions at the current 
stage’. The sputtering yield of Si-water ice mixture is unknown. Nonetheless, the 
presence of nanosilica particles and ice grains forming from hydrothermal fluids 
surely will supply the magnetosphere with silicon ions. Future modelling efforts 
should focus on the ionization, ion lifetime and acceleration processes that may be 
responsible for the enhanced ratio of 28MT to water-group ions, (3-7) X 107°, at 
the 100 keV energy level*. 

Sample size. In data analyses above, no statistical methods were used to predeter- 
mine sample size. 
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Extended Data Figure 1 | Maps of grain density, sputtering erosion rate, contain a much larger volume than do the inner ones. b, Plasma sputtering 
and stream particle production rate in the E-ring region. a, The total E-ring _ erosion rate of E-ring ice grains in torus segments. The total sputtering rate is 
ice grain surface area map in the -z frame, where and z are distance to 8.6 X 10°* H,O molecules per second, lower but still comparable to the 
Saturn’s rotation axis and to the ring plane, respectively. Note that each bin 4.5 X 10°° H,O molecules per second derived in ref. 32. c, Normalized 
integrates azimuthally over the entire torus, meaning that the outer bins nanoparticle production rate in particles per second. Rg, Saturn radius. 
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Extended Data Figure 2 | Ejection probability of 5-nm particles from the of the order ofa day’. b, Water ice nanoparticles have lower secondary emission 
E ring. a, For silica nanoparticles, the ejection probability is mostly close to and are charged less positively and thus are less likely to be ejected. This 
unity (except within 4.5Rs). The higher local plasma density there leads to ‘forbidden region’ (the black region) extends further outward to ~5.5 Rs, 


negative dust potential and thus reduces the ejection probability’. The typical consistent with the CDA measurements”. 
timescale for silica nanoparticles to acquire sufficient kinetic energy to escape is 
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region (ER) profiles, derived from the nanodust and solar wind measurements _ are shown in blue squares and crosses, respectively. The vertical length of the 
(blue)’ and the ejection model (red), both peak at 7-9Rs. The uncertainty crosses represents the standard deviation of the stream particle rate in the 
of both profiles stems from the adopted co-rotation fraction of Saturn’s corresponding bin. Our model (red) reproduces the measured trend. 


magnetosphere (80-100%), which determines the electromagnetic acceleration _c, d, Modelled patterns assuming direct ejection from Enceladus. While the ER 
amplitude. The location of the outer rim of Saturn’s A ring and the orbits oficy _ profile is similar, these particles are only ejected along the ring plane. 
satellites are marked by grey dashed lines. b, Latitudinal-dependent ejection 
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Extended Data Figure 4 | Energy dispersive spectrum of clustered silica nanoparticles formed from the fluid sample. See Methods for details. 
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Extended Data Table 1 | Stream particle flux measurements 


UTC Dur. Dist. Lat. Impacts Impact Production 
rate rate 
(medium) (minute) (Rs) (°) (#second') — (# second”) 


2004-346T01:05 280 36.4 -5.62 4 24x10* 86x10" 
2004-347T13:40 579 28.0 -3.45 10 2.9x10% 12x10!” 
2004-348T22:25 585 18.0 0.17 18 5.1x10% 91x 10'° 


2005-063T00:07 541 382 -0.10 15 AGx 10) 3x 10" 
2005-084T18:55 135 33.5 -0.14 18 2.2x10% 1.4x10'8 
2005-084T23:25 204 32.8 -0.14 15 1.2x10° 7.2x 10!” 
2005-085T23:25 187 28.1 -0.17 8 7.1x10% 3.1x 10!” 
2005-099T05:15 564 363 -4.19 14 Ai 10 sx 10” 
2005-100T00:00 264 34.5 -4.52 5 3.1x10% 21x10!” 
2005-228T16:05 464 31.4 -17.9 18 6.5x10% 19x 108 
2005-282T10:50 114 25.8 -0.30 16 23x10° 84x10!” 
2005-330T11:50 204 13.9 -0.37 35 28x10° 3.0x 10!” 
2005-336T21:45 319 39.0 -0.02 35 1.8x10°  1.5x10!8 


2006-116T17:18 187 24.2 -0.14 18 1.6x10° 5.1x 10!’ 
2006-136T05:23 86 40.9 0.14 3 
2006-146T17:20 150 35.9 0.44 7 
2006-147T12:50 50 40.1 0.43 2 6.7x10* 5.9x 10" 
2006-147T16:55 191 40.6 0.43 8 
2006-261T04:35 858 37.6 12.2 2 
2006-308T04:35 75 28.1 19.2 4 89x10* 21x10% 


Twenty observations obtained when Saturn was within 28° of the CDA bore-sight were selected. 278 impacts were registered during the total 100.8 h observation time. Data showing the flux enhancement caused 
by solar wind-nanodust interactions®° were excluded. The impact rate is normalized to a Saturn distance of 25 Rs (inverse-square law) and is converted to production rate by the modelled flux-latitude relation 
(Extended Data Fig. 3b). The weighted production rate is (8.3+ 6.3) x 101” particles per second, corresponding to 1.0 + 0.7 g per second (assuming a mean particle radius of 5 nm). UTC (medium), medium time of 
observation in Coordinated Universal Time; Dur., duration; Dist., distance; Lat, latitude. 
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Observation of antiferromagnetic correlations in the 
Hubbard model with ultracold atoms 


Russell A. Hart'*, Pedro M. Duarte!*, Tsung-Lin Yang', Xinxing Liu’, Thereza Paiva, Ehsan Khatami’, Richard T. Scalettar*, 


Nandini Trivedi?, David A. Huse® & Randall G. Hulet 


Ultracold atoms in optical lattices have great potential to contribute 
to a better understanding of some of the most important issues in 
many-body physics, such as high-temperature superconductivity'. 
The Hubbard model—a simplified representation of fermions mov- 
ing on a periodic lattice—is thought to describe the essential details 
of copper oxide superconductivity*. This model describes many of 
the features shared by the copper oxides, including an interaction- 
driven Mott insulating state and an antiferromagnetic (AFM) state. 
Optical lattices filled with a two-spin-component Fermi gas of ultra- 
cold atoms can faithfully realize the Hubbard model with readily 
tunable parameters, and thus provide a platform for the systematic 
exploration of its phase diagram**. Realization of strongly correlated 
phases, however, has been hindered by the need to cool the atoms to 
temperatures as low as the magnetic exchange energy, and also by the 
lack of reliable thermometry’. Here we demonstrate spin-sensitive 
Bragg scattering of light to measure AFM spin correlations ina reali- 
zation of the three-dimensional Hubbard model at temperatures 
down to 1.4 times that of the AFM phase transition. This temperature 
regime is beyond the range of validity of a simple high-temperature 
series expansion, which brings our experiment close to the limit of the 
capabilities of current numerical techniques, particularly at metallic 
densities. We reach these low temperatures using a compensated 
optical lattice technique’, in which the confinement of each lattice 
beam is compensated by a blue-detuned laser beam. The temper- 
ature of the atoms in the lattice is deduced by comparing the light 
scattering to determinant quantum Monte Carlo simulations’ and 
numerical linked-cluster expansion‘ calculations. Further refinement 
of the compensated lattice may produce even lower temperatures 
which, along with light scattering thermometry, would open avenues 
for producing and characterizing other novel quantum states of mat- 
ter, such as the pseudogap regime and correlated metallic states of 
the two-dimensional Hubbard model. 

A two-spin-component Fermi gas in a simple cubic optical lattice may 
be described by a single-band Hubbard model with nearest-neighbour 
tunnelling t and on-site interaction U> 0. Ata density n of one atom 
per site, and for sufficiently large U/t, there is a crossover from a ‘meta- 
llic’ state to a Mott insulating regime’ as the temperature T is reduced 
below U. The Mott regime has been demonstrated with ultracold atoms 
in an optical lattice by observing the reduction of doubly occupied 
sites'® and the related reduction of the global compressibility'’. For T 
below the Neel ordering temperature Ty, which for U>>t is approxi- 
mately equal to the exchange energy J = 4t°/U, the system undergoes a 
phase transition to an AFM state’’. In the context of quantum simula- 
tions, AFM phases of Ising spins have been previously engineered with 
bosonic atoms in an optical lattice’* and with spin-1/2 ions’*"’. Also, 
nearest-neighbour AFM correlations due to magnetic exchange have 
been observed along one dimension of an anisotropic lattice'®. The 


same experiment achieved temperatures as low as T = 0.95t ~ 2.6Ty 
when the lattice was configured to be isotropic’’, where Ty = 0.36t is 
the maximal value of the Néel transition temperature’*’*”. 

Our experiments are performed with an all-optically produced”, quan- 
tum degenerate, two-state mixture of the two lowest hyperfine ground 
states of fermionic °Li atoms, which we label |{) and ||). The repulsive 
interaction between atoms in states | {) and | |) is controlled via a mag- 
netic Feshbach resonance”', which we use to set the s-wave scattering 
length a, in the range from 80dy to 560dp, where dy is the Bohr radius. A 
simple cubic optical lattice is formed at the intersection of three mutu- 
ally perpendicular infrared retroreflected laser beams. We can dynam- 
ically rotate the polarization of the retroreflection, and thus continually 
adjust the potential between a lattice and a harmonic dimple trap. The 
overall confinement produced by the Gaussian envelope of each infra- 
red lattice beam is partially compensated with a superimposed, non- 
retroreflected, blue-detuned laser beam®”*. The compensation beams 
serve three purposes: (1) they help flatten the confining potential in 
order to enlarge the volume of the AFM phase; (2) they provide a way 
to maintain the central density near n ~ 1 as the lattice is loaded; and 
(3) they may mitigate the effects of heating in the lattice by lowering the 
threshold for evaporation. 

A degenerate sample with total atom number N between 1.0 X 10° 
and 2.5 X 10° is prepared in the harmonic dimple trap (without com- 
pensation) at a temperature T/Tp = 0.04 + 0.02, where Ty is the Fermi 
temperature. The lattice is turned on slowly to a central depth of 
Vo = 7E, (see Methods), where E, = h*/(2m4’) is the recoil energy, h 
is Planck’s constant, m is the atomic mass, and 2 = 1,064 nm is the 
wavelength of the lattice beams. While loading the lattice, the intensities 
of the compensation beams are adjusted to maintain a peak density 
n =~ 1. We have measured the temperature in the dimple trap before and 
after transferring the atoms to the lattice (see Methods and Extended 
Data Fig. 3), and have observed that the compensating beams mitigate 
heating in the lattice, perhaps by allowing continued evaporative cool- 
ing® or by a reduction of three-body loss. 

Bragg scattering of near-resonant light**** is depicted in Fig. 1. The 
Bragg condition for scattering from an AFM-ordered sample is satisfied 
when the momentum @Q transferred to a scattered photon is equal to 7, 
where m= (2n/a)(—1/2,—1/2,1/2) is a reciprocal lattice vector of 
the magnetic sublattice, and a = 4/2 is the lattice spacing. Cameras are 
positioned to detect scattering at Q = mand also at Q = 0, a momen- 
tum transfer that does not satisfy the Bragg condition and is used as a 
control. We obtain spin sensitivity, in analogy to neutron scattering in 
condensed matter, by setting the Bragg laser frequency between the 
optical transition frequencies for the two spin states*®’’. Prior to the 
measurement, we jump Vo to 20E, in a few microseconds to lock the atoms 
in place (see Methods), and then illuminate them in situ for 1.7 ls with 


Department of Physics and Astronomy and Rice Quantum Institute, Rice University, 6100 Main Street, Houston, Texas 77005, USA. “Instituto de Fisica, Universidade Federal do Rio de Janeiro, Caixa Postal 
68.528, Rio de Janeiro RJ, 21941-972, Brazil. 7Department of Physics and Astronomy, San Jose State University, 1 Washington Square, San Jose, California 95192, USA. “Department of Physics, University 
of California, 1 Shields Avenue, Davis, California 95616, USA. °Department of Physics, The Ohio State University, 191 West Woodruff Avenue, Columbus, Ohio 43210, USA. ®Department of Physics, 


Princeton University, Princeton, New Jersey 08544, USA. 
*These authors contributed equally to this work. 


12 MARCH 2015 | VOL 519 | NATURE | 211 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Input light 


Figure 1 | Schematic depiction of Bragg scattering. a, Rendering of the 
experimental set-up used for Bragg scattering. Light is collected for momentum 
transfers Q = mand Q= @. A bias magnetic field, which sets the quantization 
axis and the interaction strength, points in the z direction. The input Bragg 
beam lies in the y-z plane, and its wavevector makes an angle of 3° with the 
positive y axis. b, The two spin states are denoted by red and blue circles. AFM 
order develops at the Mott plateau, shown here to be located in the centre, 
where n ~ 1. AFM correlations are suppressed outside the central region where 
n< 1. Bragg scattering requires the input and output wavevectors, kj, and Kout 
respectively, to satisfy the Bragg condition kout — Kin = 7. The red and blue 
arrows denote light scattered from one spin state or the other. The two spin 
states scatter with opposite phase shifts, so that their respective sublattices 
interfere constructively for Q = 2. For a different momentum transfer 

Kout — Kin = 9, scattering is relatively insensitive to AFM correlations owing 
to the lack of constructive interference between the scattered photons, which 
have random relative phases Ad. 


the Bragg probe. Alternatively, we can suddenly turn off the 20E, lattice 
and illuminate the atoms after time-of-flight t. 

Figure 2 shows the results of simultaneous measurements of the 
scattered intensity for Q = m and Q= 6 (I, and Ig, respectively), as a 
function of t. After a few microseconds of expansion, when the extent 
of the atomic wave packets becomes comparable to the lattice spacing, 
the light scattered from correlated spins no longer interferes construc- 
tively at the detector. More precisely, the Debye-Waller factor e~ 72”) 
=exp|— is pa 7) | decays to zero after a sufficiently long t 
(see Methods) and the sample is effectively uncorrelated. Here r; is the 
displacement of an atom from the centre of the lattice site at which it 
was initially localized. 

By comparing the intensity of the light scattered in situ (t = 0) to 
that after sufficiently long t (Igo and Ig.., respectively), we effectively 
normalize the Bragg scattering signal to the diffuse scattering back- 
ground ofan uncorrelated sample, achieving high sensitivity to magnetic 
ordering and strong rejection of common mode systematics. Figure 2 
shows that there is enhanced scattering at t = 0 relative to the uncor- 
related cloud (t = 9 us) for Q@ = a, whereas for Q = 6 scattering at 
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T (us) 
Figure 2 | Time-of-flight measurement of scattered intensity from a sample 
with AFM correlations. a, Normalized intensity of Bragg-scattered light 
(Q = n)asa function of time-of-flight t. The in situ (t = 0) scattered intensity is 
denoted Igo, while the intensity after sufficiently long t, corresponding to an 
effectively uncorrelated sample, is denoted Ig... b, For Q = 6 the in situ sample 
shows a reduction of scattering, as compared to long t, due to the presence 
of double occupancies and to the presence of AFM spin correlations (see text). 
Each data point and error bar is the mean and standard error of the mean 
(s.e.m.) of at least 17 measurements of the scattered intensity. The solid grey line 
is the intensity calculated using the value of the Debye-Waller factor at 1, 
whereas the dashed grey line uses the average value of the Debye-Waller factor 
during the 1.7 1s exposure of the Bragg probe (see text and Methods). 


t =0 is reduced, such that I¢g9/Ig:.< 1. Double occupancies, present 
as ‘virtual’ states even at low temperatures”, reduce coherent scattering 
in all directions, since each atom in the pair has opposite spin and 
therefore scatters with opposite phase. For Q@ = m the coherent enhance- 
ment from AFM spin correlations exceeds this reduction. Furthermore, 
the coherent enhancement of the signal along Q = m suppresses the 
scattered intensity in other directions. 

For a momentum transfer Q, the spin structure factor Sg of the 
sample is defined as 


_ 4 io-(R|—R 
So= Nou (8) (oa) (1) 
Here N is the total number of atoms, the sums extend over all lattice 
sites i and j, R; is the location of the jth site, and a, is the z component 
of the spin operator for the jth site: 


o5)0),=010),, oylt))=+ 510) o9ll)=— 510) ealt))=0ltl), 


In a sample with complete AFM ordering S,, ~ N, whereas for uncor- 
related samples in the lattice S, = 1 and Sg = 1. The choice of the z spin 
component for this analysis is arbitrary, as each of the other axes would 
result in the same value for Sg in the absence of a symmetry-breaking 
field. In the limit of tightly localized wavefunctions (e~?@(*=9) ~1), 
and for a weak probe, the spin structure factor is Sg ~ Igo/Ig... We 
determine the spin structure factor by measuring the scattered intens- 
ities Igo and Ig.. and applying a correction to account for the in situ 
Debye-Waller factor in the 20E, lattice and for saturation of the atomic 
transition, which generates a small component of inelastically scattered 
light (see Methods). 

Within the local density approximation (LDA) we model the sample 
by considering each point in the trap as a homogeneous system in equi- 
librium at a temperature T, with local values of the chemical potential 
and the Hubbard parameters determined by the trap potential. The 
spin structure factor of the sample Sg can then be expressed as the inte- 
gral over the trap of the local spin structure factor per lattice site, sg. 
Figure 3a shows numerical calculations of s, for various temperatures 
in a homogeneous lattice with U/t = 8, close to where Ty is maximal’’. 
The figure shows that s, is sharply peaked around n = 1 and grows ra- 
pidly as T approaches Ty from above. 

Figure 3b and c shows n and s, profiles, respectively, calculated for 
our experimental parameters at various values of Up/to, where Up and 
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Figure 3 | Numerical calculations. a, Spin structure factor per lattice site sz as 
a function of n in a homogeneous lattice for several temperatures (see 
Methods). sz is sharply peaked near n = 1 and diverges as T approaches Ty. 
b, Density profiles calculated at T/tp = 0.6 for different Up/to, using in each case 
the value of N that maximizes the experimentally measured S, (see text and 
Extended Data Fig. 2). c, Profiles of the local spin structure factor s,(r), for the 
same conditions as in b. The vertical green line in b and c marks the radius at 
which s,(r) is maximized for Up/t) = 11.1 (see text). 


ty denote the local values of the Hubbard parameters at the centre of 
the trap. As seen in Fig. 3b, only a fraction of the atoms in the sample is 
near nm = 1, where AFM correlations are maximal. The finite extent of 
the lattice beams causes the lattice depth to decrease with distance 
from the centre, resulting in an increasing ¢ such that both U/t and 
T/t decrease with increasing radius for constant T (see Extended Data 
Fig. 1). The radial decrease in T/t causes s,(r) to maximize at the largest 
radius for which the density is n ~ 1. For large Up/to the cloud exhibits 
an n = 1 Mott plateau and s,(r) is maximized at the outermost radius 
of the plateau. 

In the experiment, we measure Sg as a function of Up/ty. At each 
value of Uo/t) we vary the atom number N to maximize the measured 
S,, (see Methods and Extended Data Fig. 2). According to the picture 
presented above, this has the effect of optimizing the size and location 
of the n = 1 region of the cloud such that AFM correlations are max- 
imized. The compensation strength go, which is the same for all Uo/to; 
was also adjusted to maximize S,. We found the optimum to be 
&o = 3.7E, at a lattice depth vo = 7E, (see Methods). Besides the equi- 
librium considerations regarding the optimal size and location of the 
Mott plateau, we believe that the dynamical adjustment of go during 
the lattice turn-on reduces the time for the system to equilibrate, by 
minimizing the deviation of the equilibrium density distribution in the 
final potential from the starting density distribution in the dimple trap 
before loading the lattice. 

Figure 4 shows the measured values of S, and Sg at optimal N for 
various values of Up/to (see Extended Data Fig. 5 for the raw counts at 
the CCD cameras). We find that S, is peaked for 11 < Up/t) < 15. In 
contrast, the measurements of Sg vary little over the range of inter- 
action strengths, consistent with an absence of coherent Bragg scatter- 
ing in this direction. Measurements of S, after hold time in the lattice 
show that the Bragg signal decays for larger temperatures (see Extended 
Data Fig. 4). Comparing the measured S,, with numerical calculations 
for a homogeneous lattice (for example, those in Fig. 3a) allows us to set 
a trap-independent upper limit on the temperature, which we deter- 
mine to be T/tp < 0.7. 

Precise thermometry is obtained by comparing the measured S, with 
numerical calculations averaged over the trap density distribution for 
different values of T. The results of such numerical calculations are 
shown in Fig. 4, labelled by the value of T/t«, which we define as the 
local value of T/t at the radius where the spin structure factor per lattice 
site is maximal (see Fig. 3c). At Uo/tp = 11.1, where measured AFM 
correlations are maximal, we find T/t« = 0.51 + 0.06, where the uncer- 
tainty is due to the statistical error in the measured S, and the systematic 
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Figure 4 | Spin structure factor. Measured S, (filled circles) and Sg (open 
circles) at optimized N (see text) for various Up/ty. The values of the s-wave 
scattering length corresponding to Up/to for the experimental points are shown 
along the top axis. For each point at least 40 in situ and 40 time-of-flight 
measurements of the scattered intensities are used to obtain the spin structure 
factor. Error bars are obtained from the s.e.m. of the scattered intensities; the 
raw data are presented in Extended Data Fig. 5. Numerical calculations of 

S,, (open symbols, lines as guide to the eye) and Sg (open symbols, dashed lines 
as guide to the eye) are shown for various values of T/t«. The numerical 
calculations for Sg are unreliable for T/t+ < 0.7 and Up/tp > 15. Sg decreases 
slightly for weak interactions, where the fraction of double occupancies 
increases. 


uncertainty in the lattice parameters used for the numerical calculation. 
This temperature is consistent with the data at all values of Up/tp. We 
warn, however, that for values of U/t > 10 a single-band Hubbard model 
may not be adequate, as corrections involving higher bands may become 
non-negligible?” 

As was shown in Fig. 3b, for Up/to = 11.1 the dominant contribution 
to S,,comes from the outermost radius of the Mott plateau. At that radius, 
the local value of U/t is Us/t«=9.1, consistent with determinant 
quantum Monte Carlo (DQMC) calculations for the homogeneous 
lattice'”'*"°, which find Ty to be maximized for U/t between 8 and 9. 
For Up/tp = 11.1, t« = 1.3 kHz, so we can infer the temperature of the 
system to be T=32+4nkK. In terms of Ty, the temperature is 
T/Ty = 1.42 + 0.16. At this temperature, the numerical calculations 
indicate that the correlation length is approximately the lattice spa- 
cing. The calculations show that the entropy per particle in the trap is 
S/(N kg) 0.76, where kg is the Boltzmann constant (see Extended Data 
Fig. 6). This entropy range is consistent with T/T; measured in the 
harmonic dimple trap” after a lattice round trip, as shown in Extended 
Data Fig. 3. 

We have observed AFM correlations in the three-dimensional (3D) 
Hubbard model using ultracold atoms in an optical lattice via spin- 
sensitive Bragg scattering of light. Because magnetic order is extremely 
sensitive to T in the vicinity of Ty, Bragg scattering provides precise 
thermometry in regimes previously inaccessible to quantitative tem- 
perature measurements. Whereas previous cold-atom experiments on 
the 3D Fermi-Hubbard model were in a temperature regime that could 
be accurately represented by a simple high-temperature series expan- 
sion, the data presented here are near the limit of the capabilities of 
advanced numerical simulations. Our experimental set-up can be con- 
figured to study the two-dimensional (2D) Hubbard model in an array 
of planes; further progress to lower temperature will put us in a position 
to answer questions about competing pairing mechanisms in 2D, and 
may ultimately resolve the long-standing question of d-wave super- 
conductivity in the Hubbard model. 
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METHODS 


Preparation. °Li atoms are first captured and cooled in a magneto-optical trap 
(MOT) operating at 671 nm. They are further cooled in a second MOT stage employ- 
ing 323 nm light near resonant with the 2S — 3P transition. As described prev- 
iously”®, these atoms are laser cooled into a large-volume optical dipole trap (ODT) 
where a balanced spin mixture of the states |?) = |2S,/9; F = 1/2, mp = +1/2) and 
|) = |2S1); F= 1/2, mg 1/2) is produced. 

Once the large-volume ODT is loaded, we set the magnetic field to 340G 
(a, + —289a9) to perform evaporative cooling. The intensities of the lattice beams 
(1,064 nm) in dimple configuration (with the polarization of each retroreflection 
perpendicular to that of each input beam) are turned on in 1s. The depth of the 
dimple, which at this point is only a small perturbation on the ODT, is adjusted to 
control the final atom number in the experiment. The depth of the ODT is then 
ramped to zero in 5.5 s to evaporatively cool the atoms into the dimple. To produce 
a final sample with repulsive interactions, the magnetic field is increased to 595 G 
(a, +326dp) in a 5 ms linear ramp starting 3s into the evaporation trajectory. 
Owing to the small volume of the dimple relative to the ODT, evaporation into the 
dimple is efficient and deeply degenerate samples are reliably produced. 

We measure T/T in the dimple trap by fitting the density profile, after 0.5 ms of 
time-of-flight, to a Thomas-Fermi distribution*'. The magnetic field is tuned to 
528 G to make the gas non-interacting before the measurement. For the experi- 
ments reported here, the final dimple depths are in the range between 0.325E, and 
0.5E, per axis, resulting in N in the range (1.0-2.5) X 10°. The measured value 
T/Ty = 0.04 + 0.02 is independent of N within this range. The uncertainty in T/T; 
is the standard deviation of the fitted value for at least six independent realizations. 
Here as elsewhere no statistical methods were used to predetermine sample size. 
Compensated optical lattice. The experiment takes place in a compensated sim- 
ple cubic optical lattice potential that can be expressed as 


Vsp (x,y.Z) = 


Vip(x; y.z) + Vin(y; 2.x) + Vin (z; xy) 
where 
Vi (x; y.Z) + Ve(x3 y,Z) 


and V,, Vcare the potentials produced by the lattice (1,064 nm) and compensation 
(532 nm) beams, respectively: 


24 92 
. Wore 2 (2m 
Vi (x3 y.2) Vo ep| 2 a | cos (5 x) 
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Vip(x3 y.Z) = 


Here, vo is the lattice depth and go is the compensation (V9, gy > 0). A schematic of 
the compensated lattice, and the spatial variation of the Hubbard parameters due 
to the finite lattice beam waists, are shown in Extended Data Fig. 1. 

The beam waists (1/e” radius) of the three axes are calibrated independently by 

phase modulation spectroscopy of each lattice beam and by measuring the fre- 
quency of breathing mode oscillations. The waists are found to be (up to a +5% 
systematic uncertainty) wy = (47, 47, 44) um and wc = (42, 41, 40) jum, for beams 
propagating along x, y, z, respectively. 
Lattice loading. To load the lattice from the dimple trap, we first rotate the polar- 
ization of the retroreflected beams parallel to that of the input beams in 100 ms. In 
the following 25 ms, we increase the lattice depth to 2.5E£, and ramp the magnetic 
field to set the final value of Up/tp. The lattice depth is then ramped to 7.0E, in 
15 ms. 

Throughout the process of loading the lattice from the dimple, the power of the 
compensating beams is adjusted in order to maintain the peak density of the 
sample at n ~ 1. At the final lattice depth of vp = 7.0E,, the average compensation 
per beam is go = 3.7E,. The value of go for each beam is adjusted slightly from this 
average in order to create samples that appear spherically symmetric. 
Round-trip T/T; measurements. After loading the atoms into the 7E, lattice we 
wait for a hold time f, and then reverse the lattice loading ramps to return to the 
harmonic dimple trap and measure T/T;. This measurement, shown in Extended 
Data Fig. 3, sets an upper limit on the entropy of the system in the lattice, and is 
also a measure of the heating rate of the system in the lattice. 

Temperature dependence of S,. In Extended Data Fig. 4 we show S, as a function 
of hold time in the lattice f,, and observe that it decays for longer hold times, as 
expected from the increase in T/T;. Although the preparation of the sample and 
the final potential are somewhat different for the data presented in Extended Data 
Figs 3 and 4, the data support the contention that the Bragg signal decreases with 
increasing T. 

Variation of Nto maximize S,. The global chemical potential jig must be increased 
for larger Uo/tp to guarantee the formation of a Mott plateau in the trap. A larger Hg 
results in larger atom number. N is adjusted to maximize the Bragg signal for each 
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experimental value of Up/to in Fig. 4. We adjust N by tuning the depth of the dimple 
trap in which degeneracy is achieved before loading the atoms into the lattice. The 
optimal value of N as a function of Upo/to is shown in Extended Data Fig. 2. 

Spin structure factor measurement. We measure the spin structure factor at two 
different values of the momentum transfer Q given by 


2m 
m= —(—0.5,—0.5,+0.5) 
a 
20 
0 (+0.396, —0.105, —0.041), 
a 


where a = //2 is the lattice spacing. 

We detect the scattered light using two separate cameras as the cloud is illumi- 
nated with the Bragg probe beam for 1.7 1s. The Bragg probe beam is a collimated 
Gaussian beam with a waist of 450 um and 250 iW of power, resulting in an intensity 
I, =79mW cm *. The intensity of the probe determines the on-resonance sat- 
ee 


uration parameter sp = I, ple ep" @y | ci = 15.5, where cis the speed of light, 


ép is the polarization of the probe light, é_ ; is the unit vector in the direction of the 
dipole matrix element of the transition, 2) = 671nm is the wavelength of the 
transition, and J” is its linewidth. The polarization of the incident light in our 
experiment is linear and perpendicular to the quantization axis, so |é,-e e) |? =1/2. 
The Bragg probe detuning is set between the two spin states, such that 
A=|A;| =|4,| =64/; where 4; and A, are the detunings from the two spin 
states. 

The spin structure factor is defined in equation (1) as a sum over lattice sites i, j. 
By quickly ramping the lattice depth to vp = 20E,, the state of the system is pro- 
jected into a product state, where the wavefunction of each atom is localized at a 
lattice site. Hence, we can write So as a sum over particles m, n: 


4 ‘ites 
S9= AY eB (62) gl 02) 


where (c,),, is the z component of the spin of the nth atom. 

When illuminated with the probe light, each atom can be considered as an 
independent scatterer, and the intensity at the detector can be obtained by sum- 
ming the field contributions from the individual atoms and squaring the total field. 
We assume that the spatial wavefunction of all atoms is the harmonic oscillator 
ground state in a lattice site of depth vp, and that it does not change during the 
measurement. The resulting intensity at the detector is given by 


Asy /2 2As 0" 
Aso/ Nee S>4(62) (Fz) ne 
40° +59 (46 +59) mn 


mén 


Io(1)= put) 


where 6 = A/T, and A= om 
8n 


|A|’. Here 4 is the polarization vector of the 


scattered field, A =n x (nx @_ where 7 is a unit vector pointing in the direction 
of the detector, which is located at a distance rp from the sample. 

In equation (2) the first term arises from uncorrelated scattering by the atoms, 
while the second term represents the interference due to magnetic correlations. 
We can identify the spin structure factor in the interference term as 

4(az) »,(Gz) ,e12 ®—®») = N(Sg —1) 


mn 
mén 


and obtain 
I 
Sg =1+Co(t) (2 = 1) 
ips 


A. 
where Ig. = pe 


iz N, and the correction factor is Cg(t) = e? Wal) (1 + *,). 
So 

In the experiment we obtain Sg by combining measurements of the scattered 
intensity in situ (t = 0) and after sufficiently long time-of-flight (t = 6 1's). The 
correction factor takes the values C,(t = 0) = 1.52 for Q = mand Co(t = 0) = 1.18 
for O= 0. 

Time-of-flight. After the atoms are released in time-of-flight, the Debye-Waller 
factor decays as the atomic wavefunctions expand, resulting in a corresponding 
decay of the Bragg scattered intensity. For a lattice of depth vo 


VTE, (2) | 


eo 2Wolt) — 9 2Wolt=0) exp| — 
2 2ma 


This equation was used to calculate the solid grey line in Fig. 2. The average value of 
the Debye-Waller factor during the duration of the Bragg exposure 
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is used to calculate the dashed grey line in Fig. 2. 

The data shown in Fig. 2 was taken at Up/ty = 13.4 with N = 2.5 X 10° atoms. 
This value of N is above the optimal value, so the ratio of I7o/I,.. in Fig. 2 gives 
S,,~ 1.4, which is less than the expected optimal value of S, from Fig. 4. 
Momentum transferred from the probe to the atoms. As mentioned above, we 
assume that the spatial wavefunction of the atoms remains unchanged for the 
duration of the exposure. For this assumption to be valid, the Lamb-Dicke para- 
h /(2m25) 
2E, Vo /E: 
approximately one out of every four photons scattered will excite an atom to the 
second band of the lattice. An atom in the second band has larger position uncer- 
tainty and hence a smaller Debye-Waller factor, which reduces its contribution to 
the Bragg scattering signal. _ ap 

The total number of photons scattered per atom is given by Np = texp I” +a” 


where the duration of the probe is tx, = 1.7 us. For s)= 15.5 and 6 =6.4, 
Np = 2.7, thus justifying the assumption that the atoms remain in the lowest band 
during the pulse. 

For the Bragg scattering measurements performed after time-of-flight, the momen- 
tum transferred from the probe to the atoms plays a more important role, since the 
atoms are not trapped and will recoil after every photon scatter. Despite this, we 
still see a good agreement between the observed decay of the Bragg scattering signal 
and the decay expected for a Heisenberg-limited wave packet, as shown in Fig. 1. 
We have also performed non-spin-sensitive Bragg scattering measurements from 
the 010 planes of the lattice and observe the same agreement, justifying that 
momentum transfer from the probe to the atoms can be neglected for the exposure 
times used. 

Optical density. A low optical density of the sample is important so that the probe 
is unattenuated through the atom cloud, and multiple scattering events of the 
Bragg scattered photons are limited’*. The optical density can be approximated as 


oo|@@al’ 1 /3N\¥3 
OD~— a |qe 
40° +s) a \4n 


meter 17? = needs to be <1. In the 20E£, lattice, nv = 0.27, meaning that 


where do =3/5/2n. With sp = 15.5, 6 =6.4 and N=1.8X 10° atoms we have 
OD ~ 0.072. At this value we do not expect significant corrections to the spin 
structure factor measurement due to the attenuation of the probe. We have not 
included any corrections in our measurement due to finite optical density effects. 
Light collection. We collect Bragg scattered light in the m direction over a full 
angular width of 110 mrad, given by a 2.5cm diameter collection lens located 
23cm away from the atoms. In the 6 direction, light is collected by a 2.5cm 
diameter lens placed 8 cm away from the atoms, corresponding to a full angular 
width of 318 mrad. The scattered light in each of the directions is focused to a few 
pixels on the cameras, so no additional angular information is obtained. For 
N=18 10°, So = 15.5, 4 = 6.4 Tanda 1.7 ps pulse, the detector in the a dir- 
ection collects approximately 1,300 photons, whereas the detector in the @ dir- 
ection collects approximately 10* photons. The noise floor from readout, dark 
current and background light per shot has a variance equivalent to approximately 
250 photons in the z direction and 1,000 photons in the 0 direction. 

Data averaging. The signals we detect are small enough that an uncorrelated 
sample may, in a single shot, produce a scattering signal as large as the ones 
produced by samples with AFM correlations. To obtain a reliable measurement 
of S,, we average at least 40 in situ shots to obtain Igg and at least 40 time-of-flight 
shots to obtain Ig... 

We estimate the expected variance on S, by considering a randomly ordered 
sample in which e” ®*2(¢,),, is equal to +1 or —1 with equal probability. S$, can be 
written as 
2 


Sr= 


im’ Ry, 2(52) n 
NN 


which is equivalent to the square of the distance travelled on an unbiased random 
walk with step size 1 7 VN. The mean and standard deviation can then be readily 
calculated: S$, =1 and \/Var(S,) = ,/2, where Var(S,,) denotes the variance of the 


random variable S,. With a standard deviation that is larger than the mean value, a 
considerable number of shots needs to be taken in order to obtain an acceptable 


error in the mean. The standard error of the mean for 40 shots will be 


4/2/40 =0.22, consistent with what we obtain in the experiment (see Fig. 4). 
Numerical calculations. DQMC and numerical linked-cluster expansion (NLCE) 
calculations are used to obtain the local values of the thermodynamic quantities in 
our trap, including the density, entropy, and the spin structure factor. DQMC 
calculations for arbitrary chemical potential (and hence density) can be obtained 
reliably down to temperatures slightly above the Néel temperature for a given U/t 
<9. For stronger interactions, intermediate values of n become inaccessible to 
DQMC owing to the sign problem, in which case we rely on the NLCE to obtain 
values of the thermodynamic quantities for arbitrary chemical potential down to 
temperatures as low as T/t = 0.40. 

DQMC results for a 6 X 6 X 6 lattice were obtained with the methodology 
described in refs 7 and 32. Inverse temperature discretization At = f/L, where 
B= \1/T and L = 20, is sufficiently small that Trotter corrections are substan- 
tially less than statistical error bars. DQMC data were obtained with 1,000 sweeps 
through the lattice for equilibration, and between 5,000 (small U and high T) and 
200,000 (large U and low T) sweeps for measurements. Finite-size effects were 
assessed by comparing DQMC results for 6 X 6 X 6 and 8 X 8 X 8 lattices. Dif- 
ferences are only appreciable when the spin structure factor per lattice site, s, > 5. 
The local value of s, is always less than 4 in calculations shown here, so DQMC 
results in a 6 X 6 X 6 lattice are sufficient for the comparison with theory. 

In NLCEs, an extensive property of the lattice model per site in the thermodyn- 

amic limit is expressed in terms of contributions from finite clusters that can be 
embedded in the lattice. NLCEs use the same basis as high-temperature expan- 
sions, however, properties of clusters are calculated via exact diagonalization, as 
opposed to a perturbative expansion in powers of the inverse temperature**’. The 
site-based NLCE for the Hubbard model” is implemented here for a 3D lattice and 
carried out to the eighth order for all thermodynamic quantities, except for Sg, 
where due to the reduced symmetry, only seven orders were obtained. Within its 
region of convergence (T/t 2 1.5 for any n and U), NLCE results do not contain 
any systematic or statistical errors. The convergence region extends to much lower 
T/t at n=1 and generally improves by increasing the interaction strength. At 
lower T/t, we take advantage of numerical resummations, such as Euler and 
Wynn transformations*’, to obtain an estimate. The NLCE provides a fast tool, 
which, given the value of U/t, generates results on a dense temperature and chem- 
ical potential grid in a single run. 
Local density approximation. The local density approximation, which has been 
previously shown to agree well with ab initio DQMC simulations of the trapped 
Hubbard Hamiltonian**, was used to calculate the trap profiles of the different 
thermodynamic quantities. The spin structure factor Sg is obtained from the trap 
profile of the spin structure factor per lattice site as 


1 

So= Nal | Snr 
For the numerical calculations we set T and {19; local values of U/t, T/t, and the local 
chemical potential ju/t are calculated using the known trap potential. The local 
values of the thermodynamic quantities are then obtained by interpolation from 
NLCE and DQMC results for a homogeneous system calculated in a (U/t, T/t, 1/t) 
grid. Radial profiles for the local value of U/t, T/t, and i/t along a body diagonal of 
the lattice were used and spherical symmetry assumed. 
Entropy. In Fig. 4 we compare the experimental results at various Up/ty with 
calculations at constant T. Since ultracold atoms are isolated systems, a constant 
value of the overall entropy per particle S/(Nkg) may be more appropriate. We find 
that over the range 10 < Up/tg < 15, where AFM correlations are largest, S/(Nkg) 
does not vary significantly with Uo/to, at constant T (Extended Data Fig. 6). This 
implies that we do not expect adiabatic cooling for stronger interactions'*”’, and 
thus the curves at constant T are suitable to describe the experimental data. 
Code availability. The codes used for DQMC and NLCE calculations are available 
by request from the authors. 
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Extended Data Figure 1 | Compensated optical lattice. a, Schematic of the 
compensated optical lattice set-up. Along each axis, the radial confinement of 
the lattice is compensated with a repulsive compensation beam which is 
combined with the lattice beam using a dichroic mirror. The compensation 
beam co-propagates with the lattice beam but is not retroreflected; instead a 
dichroic mirror before the retro-reflection mirror is used to direct the 
compensation beam to a beam dump. b, The local value of the lattice depth v 
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(black line; right-hand y axis) is shown as a function of distance from the centre 
along a body diagonal of the lattice. Owing to the finite extent of the lattice 
beams, v varies across the density profile of the cloud. The density n, calculated 
for Up/tp = 11.1 at T/t) = 0.60, is shown (blue line; left-hand y axis). c, The 
inhomogeneity in v results in spatially varying Hubbard parameters t (blue line; 
left-hand y axis) and U/t (black line; right-hand y axis). 
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Extended Data Figure 2 | Atom number for the data in Fig. 4. Atom number 
N which maximizes S, as a function of U/to. We control N by adjusting the 
depth of the dimple trap. Using a linear calibration between the depth of 

the dimple trap and the final atom number, we obtain the value of N 
corresponding to the data in Fig. 4. The error bars correspond to the s.e.m. of 
the dimple depths used in at least 40 in situ and 40 time-of-flight realizations of 
the experiment, corresponding to the data in Fig. 4. The line is a third-order 
polynomial fit, which is used to interpolate the value of N for numerical 
calculations shown in Fig. 4. 
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Extended Data Figure 3 | Round-trip temperature measurements. 
Measurement of the round-trip T/T, versus hold time , in a compensated 
lattice with vp = 7E, and go = 3.7E,. The duration of the loading ramps is 
not included in t,. The scattering length is 326a9, which corresponds to 
Uo/ty = 12.5. Error bars are the s.e.m. of six independent realizations. The 
temperature in the dimple trap before loading into the lattice is 

T/T = 0.04 + 0.02. 
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Extended Data Figure 4 | Bragg signal decay with hold time. a, Detected 
counts (from CCD camera) versus t,, measured for momentum transfer Q = 7 
for an in situ sample (In, green circles) and after decay of the Debye-Waller 
factor (I,., blue triangles). For longer hold times, the Bragg-scattered intensity 
I, decays to match I,,., reflecting the absence of AFM correlations in a sample 
at higher T. b, The spin structure factor S, corresponding to the scattered 
intensities shown in a. For these measurements the scattering length is 200d, 
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corresponding to Up/to = 7.7 in a 7E, deep lattice. The compensation is 

&o = 4.05E,, different from that used for the data in Fig. 4. The increased 
compensation requires a larger atom number to realize an n ~ 1 shell in the 
cloud. The atom number used here is 2.6 X 10° atoms. The duration of the 
Bragg probe is 2.7 1s for these data. Error bars in a are the s.e.m. of at least 5 
measurements for I. and at least 10 measurements for Io. Error bars in b are 
obtained from the s.e.m. of the measured intensities and equation (2). 
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Extended Data Figure 5 | Detected counts for measurement of spin 
structure factor in Fig. 4. a, Detected counts versus Up/to, measured for 
momentum transfer Q = x for an in situ sample (I,; green circles), and after 
decay of the Debye-Waller factor (I,..., blue triangles). As Up/to increases we use 
a larger atom number to optimize the Bragg signal. I,. and Io both increase 
with Up/t owing to the larger N, but Inq shows an additional enhancement 
due to the presence of AFM correlations. b, Detected counts versus Up/to; 
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measured for momentum transfer Q = @ for an in situ sample (Ig, green 
circles), and after decay of the Debye-Waller factor (Ig,., blue triangles). For 
Q = @ most of the dependence for both the in situ and time-of-flight intensities 
is due to the changing N. Error bars in both a and b are the s.e.m. of at least 40 
measurements. The overall count rate is higher for Q = 8 owing to the different 
collection efficiency and gain settings of the CCD camera. 
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Extended Data Figure 6 | Entropy per particle at constant T. Overall entropy 
per particle S/(Nkg) as a function of Up/to for the calculations at various T/t« 
shown in Fig. 4 (lines are guides to the eye). For the lowest temperatures, 
S/(Nkg) does not vary significantly over the range of Up/t) covered by the 
experiment, justifying the treatment at constant T. A value of S/(Nkg) ~ 0.76 is 
obtained for the temperature determined from the data in Fig. 4. 
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Decrease in CO, efflux from northern hardwater 
lakes with increasing atmospheric warming 


Kerri Finlay’, Richard J. Vogt'+, Matthew J. Bogard'}, Bjorn Wissel’, Benjamin M. Tutolo*, Gavin L. Simpson? & Peter R. Leavitt"? 


Boreal lakes are biogeochemical hotspots that alter carbon fluxes 
by sequestering particulate organic carbon in sediments’” and by oxi- 
dizing terrestrial dissolved organic matter to carbon dioxide (CO) 
or methane through microbial processes**. At present, such dilute 
lakes release ~1.4 petagrams of carbon annually to the atmosphere**, 
and this carbon efflux may increase in the future in response to ele- 
vated temperatures’ and increased hydrological delivery of mineraliz- 
able dissolved organic matter to lakes®’. Much less is known about the 
potential effects of climate changes on carbon fluxes from carbonate- 
rich hardwater and saline lakes that account for about 20 per cent of 
inland water surface area**. Here we show that atmospheric warm- 
ing may reduce CO, emissions from hardwater lakes. We analyse 
decadal records of meteorological variability, CO, fluxes and water 
chemistry to investigate the processes affecting variations in pH and 
carbon exchange””® in hydrologically diverse lakes of central North 
America. We find that the lakes have shifted progressively from being 
substantial CO, sources in the mid-1990s to sequestering CO, by 
2010, with a steady increase in annual mean pH. We attribute the 
observed changes in pH and CO, uptake to an atmospheric-warming- 
induced decline in ice cover in spring that decreases CO2 accumula- 
tion under ice, increases spring and summer pH, and enhances the 
chemical uptake of CO; in hardwater lakes. Our study suggests that 
rising temperatures do not invariably increase CO, emissions from 
aquatic ecosystems. 

Boreal lakes are important in global carbon (C) cycles because they 
receive ~2.9 Pg C per year from terrestrial sources**, permanently bury 
~0.6 Pg per year as particulate C (refs 1, 2), and mineralize up to 50% of 
the remainder to CO, and methane’ through bacterial activity in the 
water column"! and sediments”. In general, dilute unproductive lakes 
release more gaseous C than is fixed by aquatic photosynthesis''!*"*, 
whereas net CO, uptake occurs in some productive basins when ele- 
vated nutrient influx intensifies primary production and labile organic 
C is incompletely mineralized by bacteria**’’. At present, the magni- 
tude of C fluxes from boreal lakes is similar to those arising from global 
deforestation, oceanic CO, sequestration and net terrestrial production‘; 
however, future mineralization of organic matter is predicted to inten- 
sify under a warmer’ or wetter climate®’. 

Less is known about how solute-rich hardwater lakes influence plane- 
tary C fluxes*’, despite accounting for ~50% of inland waters by volume”® 
(23% by area)’, in part because pH regulates inter-annual variation in 
atmospheric CO, exchange at these sites independently of microbial 
metabolism during summer®”, and because controls of inter-annual 
variation in pH are poorly understood”"°. Typically, hardwater lakes 
are alkaline (8 < pH < 11), rich in dissolved inorganic C (DIC) derived 
from catchment sources of HCO, and CO,” , and evade (release) much 
more CO, (up to 200 mmol Cm *d‘’) than do boreal lakes (up to 
60 mmol m -*d~') when pH values are below 9.0 (refs 4, 8-10). At higher 
pH, CO, is converted to HCO; and CO;" (ref. 17), partial pressure 
of CO2 (pco,) declines to below atmospheric values, and hardwater 
lakes sequester atmospheric CO, (refs 8-10). Furthermore, DIC-rich 


hardwater and saline lakes exhibit a high degree of spatial synchrony in 
mean summer pH’*”’ and can rapidly vary the direction and magnitude 
of CO, flux’. Thus, a better understanding of the mechanisms regulat- 
ing inter-annual variation in pH and carbon processing of hardwater 
lakes is essential to quantify the contribution of these ecosystems to the 
global carbon cycle*”®. 

Weanalysed 16 years of meteorological and limnological data collected 
every two weeks during May to August from six lakes, a 28-year record 
of weekly chemical determinations at one lake, and surveys of water 
chemistry in an additional 20 (seasonal) to 70 (annual) lakes to identify 
factors regulating inter-annual variation in lake pH and COQ, flux within 
a 236,000 km? region of the Northern Great Plains of North America 
(Extended Data Fig. 1). Our grassland study region represents more 
than 40% ofall cultivated land in Canada and is composed mainly (75%) 
of agricultural fields and pastures, particularly within the 52,000 km? 
Qu Appelle River catchment (50° 00’-51° 30’ N, 101° 30’-107° 10’ W) 
of southern Saskatchewan”’. Study lakes within this drainage basin vary 
tenfold in most morphometric, hydrological and limnological features 
(Extended Data Table 1), include both reservoirs (Wascana and Diefen- 
baker lakes) and sites with limited hydrological outflow (Last Mountain 
Lake), yet are all alkaline (mean summer pH ~8.8; 30-60 mg DIC 1) 
and well mixed (except occasionally stratified Katepwa Lake) and have 
common plankton composition and trophic relationships’®”’. 

Analysis of 16 years of water chemistry and C flux data revealed that 
Qu’Appelle lakes have shifted progressively from being large CO. sources 
in the mid-1990s to sequestering substantial amounts of CO, at pres- 
ent (Fig. 1). The annual pH of these lakes has steadily increased from 
8.3 + 0.1 in 1995 to 9.2 + 0.1 in 2010 (means = s.e.m.; n = 6; Fig. 1a), 
whereas total inorganic carbon (TIC) (Fig. 1b), hydrological influx’ (not 
shown) and lake production’ (not shown) have remained essentially 
unchanged. The consequence of these shifts is that aquatic poo, has 
declined nearly tenfold in all lakes (Fig. 1c), atmospheric CO evasion 
has been decreased by nearly 100 gC m 7 per summer (Fig. 1d), and 
lakes now sequester substantial quantities of CO, (37.4 + 6.5gCm 7 
per summer) (Fig. 1d). Despite the marked physical, hydrological and 
chemical differences between lakes (Extended Data Table 1), inter- 
annual variation in pH and CO, parameters is highly coherent among 
ecosystems’ and shows spatial patterns of synchrony that are charac- 
teristic of ecosystem regulation by energy influx (air temperature, irra- 
diance) rather than by mass influx (precipitation, runoff, solutes)'?”?. 

Principal components analysis suggests that the mean summer pH 
of Qu’Appelle lakes increased as a function of both spring and annual 
air temperatures, was correlated inversely with the duration and date 
of ice melt, and was uncorrelated with other measured meteorological 
variables (Extended Data Fig. 2a). In particular, pH was elevated under 
warmer atmospheric conditions, including those associated with a nega- 
tive Southern Oscillation Index and positive (warm) phase of the Pacific 
Decadal Oscillation, both of which represent mild winters and reduced 
ice cover’’. In contrast, mean summer pH was not correlated strongly 
with any measured aspect of lake chemistry, other than ammonium 
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Figure 1 | Temporal changes in summer pH and CO, flux in six hardwater 
lakes of central Canada. a, Surface water pH; b, total inorganic carbon 
(TIC) concentration (mg C17’); c, logy partial pressure of CO, (patm); 

d, chemically enhanced flux of CO, (gC m * per summer). All time series are 
unweighted means and s.e.m. (n = 6). Least-squares regression analysis 
revealed linear increases in pH and declines in pco, and CO, flux when 
conducted using years with complete summer sampling. Least-squares 
regression analyses exclude 2000, a year lacking samples during late July to 
September. Mean pco, of the atmosphere (370 j1atm) is indicated in c with a 
horizontal dashed line. 


(NH,*) concentration (Extended Data Fig. 2b). Together these pat- 
terns are consistent with previous observations from other hardwater 
lakes and suggest that prolonged ice cover arising from cold winters 
favours increased CO, accumulation under ice and declines in under- 
ice pH after CO, hydration and carbonic acid formation**”. 

Least-squares regression analysis of the decadal time series for 
Qu’Appelle lakes also showed that warmer winter temperatures were 
correlated negatively with both the duration of ice cover (Fig. 2a) and 
the date of ice melt (Fig. 2b), as has been noted elsewhere””®. During 
years of prolonged cover, the date ofice melt is delayed by up to 20 days 
and the pH of Qu’Appelle lakes during spring (see below) and summer 
can be depressed by up to 1 pH unit (Fig. 2c). As shown in diverse lake 
districts, prolonged ice cover allows the accumulation of CO, from min- 
eralized organic matter, which in turn hydrates to lower pH through 
formation of carbonic acid’’””*. Furthermore, as pH is depressed, chem- 
ical equilibria dictate that a higher proportion of DIC is present as free 
CO,, which can evade to the atmosphere”’. In Qu ’Appelle lakes, sum- 
mer pH values below 9.0 were associated with substantial CO, evasion, 
whereas these lakes captured up to 50g Cm * during summers with a 
mean pH of >9.0 (Fig. 2d). 

Detailed study of Buffalo Pound Lake, Saskatchewan, Canada, within 
the Qu’Appelle catchment illustrates the linkage between the duration 
of ice cover, the metabolic production of CO, and the depression of pH 
in spring and summer waters (Fig. 3). This lake has been monitored 
continuously at weekly intervals since 1979, with comprehensive chem- 
ical analysis from 1985 to 2003. Here we found a strong negative rela- 
tionship between the duration of ice cover and the mean lake water pH 
during March, the month immediately preceding ice melt (Fig. 3a). In 
addition, spring pH was correlated positively with coeval determinations 
of oxygen content (Fig. 3b), suggesting that variations in mineralization 
of organic matter by microbes underlie both patterns”””*. Finally, we 
found a strong linear relationship between pH during March and values 
recorded in subsequent months (Fig. 3c), suggesting that variation in 
under-ice conditions can alter pH during the following summer. 


216 | NATURE | VOL 519 | 12 MARCH 2015 


Statistical and geochemical modelling of winter water chemistry from 
1985 to 2003 reveals that metabolic production of CO, was the main 
control of inter-annual variation in the spring pH of Buffalo Pound Lake. 
For example, elastic net analysis explained 81% of observed deviance 
in winter pH and showed that microbial metabolism under ice was the 
principle predictor of the pH at spring ice melt, with a nearly fourfold 
greater standardized coefficient (0.14) than either HCO; or Ca’* (0.04), 
the only other significant and substantial model predictors (Extended 
Data Fig. 3). Similarly, geochemical modelling demonstrated that under- 
ice CO; evolution (O; decline < respiratory quotient of 1.2 = CO, pro- 
duction) was sufficient to depress the pH from values observed at ice 
formation (8.32 + 0.06; mean + s.e.m.) to those (7.83 + 0.07) similar 
to values observed at the winter pH minimum (7.78 = 0.08) or date of 
ice melt (8.06 + 0.08). Finally, geochemical modelling revealed that 
spring ice melt resulted in a short-lived release of CO} but that the resul- 
tant increase in water-column pH was too small to uncouple the linear 
relationship between spring and summer pH (Extended Data Fig. 4). 
Together, these patterns suggest that variation in ice cover regulates the 
magnitude and direction of atmospheric CO, exchange by controlling 
spring and summer pH, altering the duration of the ice-free period and 
changing the proportion of time in which water-column pH is above or 
below the threshold of 9.0 (Fig. 2d). 

Monitoring of other regional lakes since 2002 (refs 18, 29) has revealed 
that pronounced inter-annual variation in mean and seasonal pH is 
common and synchronous within the grassland region of central Canada 
(Extended Data Fig. 5). For example, the mean summer pH of Qu’Appelle 
lakes during 2002-2009 was highly correlated with that of ~20 DIC-rich 
closed-basin lakes within an independent 100,000 km? region (Extended 
Data Fig. 5a), whereas the rate of seasonal increase in pH was not sig- 
nificantly different between the two groups of lakes (Extended Data 
Fig. 5b). Furthermore, the chemical and hydrological properties of these 
closed-basin sites are representative of an additional ~50 DIC-rich 
hardwater and saline lakes surveyed during the past decade’*”’. As shown 
elsewhere, inter-annual variation in pH within these closed-basin lakes 
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Figure 2 | Effects of winter temperature on ice cover, lake chemistry and 
CO, flux in lakes of central Canada. a, b, Least-squares regression analysis of 
meteorological and lake variables showing correlation of warmer winter 
temperatures (mean daily °C during February to April) with decreased 
duration of ice cover (a) and earlier date of ice melting the following spring (b). 
c, Correlation of date of ice melt with summer pH. d, Correlation of summer pH 
with CO), flux from hardwater lakes during summer. Regression analyses 
were as in Fig. 1, using 11 years with complete data. 
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Figure 3 | Relationship between duration of ice cover, water-column pH 
and oxygen content before ice melt in Buffalo Pound Lake. a, Pearson 
correlation analysis showing correlation between shorter ice cover (dashed grey 
line) (r = —0.575, P = 0.001) and higher pH (solid black line) in water collected 
1.5m above the lake bottom during March, the month immediately before 
ice melt’? (1 = 29). b, Positive correlation between bottom-water pH 

(r = 0.832, P< 0.0001) and the oxygen content of bottom waters (mg O, 1’) 
since 1979 (n = 27), consistent with metabolic control of lake-water pH. 

c, Linear increase in monthly pH (means + s.e.m.; n = 23) with time during the 
ice-free season (r = 0.968, P = 0.002); the rate did not vary significantly 
between years (P > 0.05) or lakes (except Wascana). The slight nonlinearity 
at pH ~8.3 reflects the minimum buffering intensity of the bicarbonate- 
carbonate system’’. Together, these findings suggest that longer ice cover 
favours metabolic production of CO, and depression of pH during spring and 
the following summer. 


produces large-scale changes in CO, flux that are highly correlated and 
synchronous with those observed in the Qu’Appelle lakes’. 

Spatially coherent increases in lake water pH and CO, uptake imply 
that reduced ice cover due to atmospheric heating may create a substan- 
tial sink for atmospheric CO, in this large agricultural region of Canada, 
both by increasing pH and by extending the ice-free season for CO 
capture. Regional winter temperature has increased by ~2.5 °C since 
the 1900s, resulting in a current annual mean temperature of ~2 °C and 
an estimated 50-day decline in ice cover asa result of earlier ice melt”*”®. 
In addition, this continental climate is characterized by high inter-decadal 
variability'*”’, such that the maximum duration of ice cover on Qu Appelle 
lakes has also declined by more than 15 days since 1980 (Fig. 3a). Anal- 
ysis of land cover using the Saskatchewan Water Security Agency geo- 
graphic information system with a resolution of 1:50,000 (or 1:250,000) 
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provides documentation that permanent and largely hardwater lakes 
cover 11,500 km? (8,367 km’ at the coarser resolution) of the 236,000 km? 
study area. Assuming that all basins have experienced a 100gCm ~ 
per summer decline in CO, efflux during the past 15 years (Fig. 1d), we 
estimate that regional hardwater lakes may capture 1.15 Mt (0.84 Mt) 
more C per summer than they did during the mid-1990s, and note 
that this value is equivalent to 34% (25%) of present agricultural CO, 
emissions*’. Although we recognize that such simple up-scaling has its 
limitations, we note that our calculations are likely to be highly conser- 
vative because they do not include estimates of CO, capture for Sep- 
tember and October, months when pH remains elevated and lakes are 
normally free of ice. 

Lakes are significant components of the global carbon budget, but 
show high variation in both the direction and magnitude of C fluxes 
and their response to global climate change*. Although further research 
is required for an evaluation of the significance of DIC-rich lakes in the 
global C cycle (Extended Data Fig. 6), we note that long-term moni- 
toring of the northern Caspian Sea, a hardwater site that accounts for 
>40% of global inland waters, reveals a decline in ice cover similar to 
that observed here’’, as well as an increase in pH to nearly 9.0 (ref. 32) 
from earlier and lower levels**. Instead, our study provides the first evi- 
dence that atmospheric warming during winter has the potential to 
increase the pH of hardwater lakes synchronously ina large geographic 
region, greatly increase the rate of CO sequestration by hard waters, 
and partly offset agricultural emissions at the subcontinental scale. Fur- 
ther, this work shows that global warming does not invariably increase 
CO, emissions from aquatic ecosystems”. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Depth-integrated samples were collected every two weeks at standardized times 
and locations between May and August during 1995-2010, except at Wascana Lake 
(1996-2010), as part of the Qu’Appelle long-term ecological research programme’®””?". 
For comparison, an additional 21 closed-basin, hardwater and saline lakes were 
sampled three to five times each summer during 2002-2010 (except 2006) using 
the same methods'’®'*”’. Finally, water chemistry parameters in Buffalo Pound Lake 
have been analysed weekly since 1979. Concentration (mmol C m7’), partial pres- 
sure (Pco, ), and chemically enhanced flux of CO; (mmol C m *d_')in Qu Appelle 
lakes was calculated on each sampling date and interpolated to estimate mean sum- 
mer conditions as described in refs 9 and 10. Changes in CO,-related parameters 
were compared with environmental variables known to influence water chemistry, 
production and CO, flux, using unreplicated linear regression performed with 
SYSTAT version 10. Causes and correlates of pH decline during winter were mod- 
elled for Buffalo Pound Lake by using Geochemists Workbench version 9.0.9 and 
elastic net analysis in R, respectively. No statistical methods were used to pre- 
determine sample size. 

Study sites. Six study lakes are located within the Qu’Appelle River catchment, a 
lotic system that drains 52,000 km’ in southern Saskatchewan, Canada (50° 00’- 
51° 30’ N, 101° 30’-107° 10’ W)'°"*?! (Extended Data Fig. 1). Sites range from meso- 
trophic upstream lakes (Diefenbaker, Buffalo Pound and Last Mountain lakes) to 
eutrophic downstream sites (Katepwa and Crooked lakes), with hypereutrophic 
Wascana Lake located in the City of Regina (Extended Data Table 1). Dissolved 
inorganic (DIC) and organic carbon (DOC) concentrations are high (30-60 and 
5-16 mg’, respectively) and tend to increase with distance from headwaters, 
with the exception of subsaline Last Mountain Lake, which had elevated levels of 
both DIC and DOC. In general, lakes have moderate to high flushing rates (water 
residence time <1.5 years) except Last Mountain Lake (~12 years), whereas all 
basins show high conductivity (400-1800 1S cm~ ') and have elevated mean sum- 
mer pH (catchment mean pH = 8.8). Lakes are polymictic in most years, except 
occasionally dimictic Katepwa Lake. 

Estimates of pH, pco, and chemically enhanced CO, flux in Qu’Appelle lakes 
were compared with those determined for a series of 21 hardwater and saline basins 
spanning an additional area of 100,000 km? within southern Saskatchewan®!*?94, 
In all cases, survey lakes lacked visible surface-water inflow and showed elevated 
pH (8.4-9.3) and carbon content (DOC = 10-159 mg Cl™ ' DIC = 18-500 mgCl- a); 
but differed greatly in size (area = 0.5-60.0 km?; maximum depth = 1.3-30 m), mean 
nutrient concentrations (orthophosphate 9-610 pg 1 1) and salinity (total dissolved 
solids 0.4-50.7 g1~'). Comparison of mean and variance of main chemical para- 
meters revealed that seasonally sampled lakes were representative of a further 50 
closed-basin lakes within southern Saskatchewan’***. 

Limnological sampling. Depth-integrated samples were collected every two weeks 
at standardized times and locations between 1 May and 31 August during 1995- 
2010, except Wascana Lake (1996-2010), for a comprehensive suite of limnological 
variables including dissolved C species, pH, nutrient content, conductivity, O2 
content, and plankton abundance, production and composition'®'*”". In contrast, 
the 21 hardwater and saline lakes lacking surface water efflux (closed basin) were 
sampled three to five times each summer with standard methods during 2002- 
2010 (except 2006) to quantify the degree to which the Qu’Appelle lakes repre- 
sented the broader prairie landscape'®'*”’. Finally, the chemistry of water obtained 
from 1.5 m above the bottom of Buffalo Pound Lake (3 m depth at the sampling loca- 
tion) has been analysed using standard methods” since 1979, with comprehensive 
water chemistry analyses conducted at weekly intervals from 1985 to 2003. 
Carbon fluxes and regulation. Concentration (mmol C m*), partial pressure (pco,) 
and chemically enhanced flux of CO, (mmol C m 7d _') were calculated on each 
sampling date from depth-integrated DIC concentrations (mg C1’), surface-water 
pH and observed wind speed (ms°') after correction for ionic strength and water 
temperature by using equations for both freshwater’ and saline ecosystems”*, as 
detailed in refs 9 and 10. In brief, pco, (Pa) was estimated by using Henry’s Law 
constant and accounted for changes in temperature*’. Chemically enhanced CO, 
flux was calculated for each sampling date in accordance with the boundary layer 
equations in ref. 38: 


net daily CO, flux=ak([CO2},,.6—[CO2] sat) 


where [CO ]}ake is the concentration of CO, in the surface water (umoll~’), 
[COz]sat is the concentration of CO; at equilibrium with the atmosphere (jumol 1™ 1) 
a is the chemical enhancement of CO) flux at high pH” and was calculated from 
the equations in ref. 40, and k is piston velocity (cmh7') determined from equa- 
tion (5) in ref. 38 relating k to wind speed, and accounting for temperature’. Mean 
wind speeds at each lake were calculated by averaging observations over all sam- 
pling dates because there were no significant differences in wind speed by month or 
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year at a given site; they varied from 2.8 + 2.0ms' (Wascana Lake) to4.3+2.7ms ' 
(Last Mountain)"°. 

Mean summer pco, and total chemically enhanced CO, flux (g C m7? per sum- 
mer) were estimated for each lake and year by interpolation of two-weekly or monthly 
measurements””° and integration over the summer without weighting values for 
differences in lake area (no substantial effect). Time series of environmental vari- 
ables known to influence water chemistry, production and CO, flux in hardwater 
and saline lakes were compared with those of catchment means by using unrepli- 
cated linear regression to identify potential mechanisms regulating inter-decadal 
variation in C flux. Predictor time series were obtained from relevant local, regional 
and national sources and included air temperature, irradiance, precipitation, evap- 
oration, river discharge and ice cover”'®"’ as well as global climate indices includ- 
ing the Southern Oscillation Index, the Pacific Decadal Oscillation and the North 
Atlantic Oscillation. In all cases there was no significant temporal autocorrelation 
within annually resolved time series. Cross-correlation analysis of untransformed 
variables was used to determine that there were no significant lagged relationships 
between time series. All regression analyses were performed with SYSTAT version 10. 
Elastic net analysis. Elastic net analysis” was used to identify and rank predictors 
of changes in under-ice pH in Buffalo Pound Lake during winter in the period 
1985-2003. Water chemistry was analysed weekly with standard procedures” for 
samples collected from 1.5 m above the lake bottom at the 3.0-m-deep site between 
the date of ice-cover formation and the date of ice melt. Water parameters included 
dissolved oxygen (O32), sodium (Na), carbonate (CO;* ), log.-transformed dis- 
solved aluminium (log.Al), fluoride (F ), temperature (temp), potassium (K*), 
log.-transformed orthophosphate (log.PO,? ~), calcium (Ca?*), dissolved magne- 
sium (Mg"), log-transformed nitrite + nitrate (log.NO3_ ), log.-transformed dis- 
solved manganese (log-Mn), bromide (Br), total phosphorus (TP), dissolved iron 
(Fe), chloride (CI ), log.-transformed ammonium (log.N: H,"), bicarbonate (HCO;,_) 
and sulphate (SO,”~ ). Dates of permanent ice formation and melt were obtained 
from ref. 19 and from unpublished records of P.R.L. 

We chose the elastic net analysis over an ordinary least-squares (OLS) multiple 
regression approach because OLS solutions are highly sensitive to small changes in 
input data (such as transformations or covariates) and because OLS imposes a hard 
threshold on the size of model coefficients (included or excluded). Although partial 
least-squares (PLS) regression can be used in place of OLS, especially when there 
are many correlated covariates to select from, we also preferred elastic net analysis 
over PLS because the former will generate sparse or parsimonious models that include 
only substantial predictors, whereas PLS models commonly include contributions 
from all covariates, irrespective of magnitude. A secondary advantage of the elastic 
net is that it provides only one coefficient per predictor, thereby simplifying model 
interpretation relative to PLS analysis. 

The elastic net’ is a regression method that minimizes a penalized deviance 
criterion, which places restrictions on the size of the regression coefficients, in terms 
of both squared and absolute values. In our application, residual sum of squares was 
used as the measure of deviance. The elastic net penalty takes the form 


PB 

23> (alB| +02) 68) 

jz 

where fi; is the regression coefficient for the jth covariate and « is a mixing param- 
eter, which controls the relative weighting of the lasso (absolute) and ridge (squared) 
contributions and which we set to 0.5. The first term in the summation aims at 
producing a sparse solution with some fi; = 0; the second term handles highly cor- 
related variables by averaging their coefficients’. In this formulation, 2 controls 
the amount of shrinkage applied to the fj. If 2 = 0, the effect of the elastic net penalty 
is cancelled and the ; take their full least-squares regression solutions. As / increases 
from zero, the coefficients are progressively shrunk until as 2 > © only the con- 
stant term (the model intercept) remains in the model. k-fold cross-validation was 
used to find an optimal value for A, and we chose the simplest model (that is, the 
largest value of 4) within one standard error of the model with lowest cross-validation 
mean squared error in the interests of parsimony. The shrinkage of the coefficients 
implies bias in their estimated values, which we trade off with a reduction in model 
variance arising from the sparsity of the solution and the handling of collinear 
covariates. To fit the elastic net model, covariates were standardized to zero mean, 
unit variance. Hence the absolute size of the estimated elastic net coefficients gives 
an indication of the relative predictive importance for pH of each covariate. The 
elastic net model was fitted with the glmnet package (version 1.9-8)** for R (version 
3.1.1 166455)". 

We present the results of the elastic net process via the entire path diagram of the 
coefficients (Extended Data Fig. 3). The path diagram shows the L1 norm (sum of 
the absolute values of f;) for models along the path from the full OLS solution on 
the right to a model containing just a constant term on the left. Consequently, the 
value of / increases from right to left in the plot. The y axis on the path diagram 
indicates the value of the standardized estimate for the f. The most parsimonious 
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model (the simplest model within one standard error of the best model) is indi- 
cated by the dashed line. An indication of the model complexity (degrees of freedom) 
is shown on the upper margin of the figure. 

Geochemical modelling. Geochemical modelling was used to evaluate the impor- 
tance of changes in under-ice CO; content on the pH of Buffalo Pound Lake for the 
period in which comprehensive chemical data were obtained at weekly intervals 
(1985-2003). As noted above, water was collected continuously from 1.5 m above 
the lake bottom by the Buffalo Pound Water Treatment Plant and analysed with 
standard protocols* for concentrations of all major ions, as well as dissolved oxygen 
(O2), oxygen saturation (%), pH, temperature, turbidity, alkalinity, total dissolved 
solids, hardness, silica, orthophosphate, total phosphorus, aluminium (dissolved, 
particulate and total Al), dissolved and total magnesium, dissolved and total iron, 
nitrite + nitrate, ammonium, total nitrogen, dissolved and total manganese, bromide, 
fluoride, dissolved organic nitrogen, dissolved organic carbon and selected micro- 
bial parameters. 

Calculations of water-column chemistry response to CO; production and efflux 
were performed with Geochemist’s Workbench version 9.0.9 outfitted with the 
standard thermo.v8.r6+ .dat database and the extended Debye-Hiickel model for 
aqueous species activity coefficients** ”. Mineral precipitation was suppressed in 
all calculations, which is common practice for the low temperatures encountered 
here’’. This decision is further justified by the fact that most of the samples in this 
study are supersaturated with respect to CaCO; and are therefore not governed by 
equilibrium with respect to this phase. 

Two types of calculation were employed to explain the relationship between 
under-ice CO, production and decreases in solution pH. First, CO. was compu- 
tationally added to the in situ aqueous system observed in the autumn until the pH 
declined to the minimum value observed in the spring. Second, actual CO, pro- 
duction under ice was estimated from declines in measured under-ice O2 concen- 
trations assuming a respiratory quotient of 1.2 (ref. 50). In both cases, calculations 
were run for each winter season using chemical parameters recorded at the date of 
complete ice cover in fall and the date of complete ice melt in spring. In addition, 
effects of CO2 were calculated for intervals of equal ice-cover duration, but offset 
from the date of ice formation by one week in either direction (ice on + 1 week, ice 
on — 1 week). For each year, estimates of CO, effects were calculated as the mean of the 
three sets of calculations, whereas overall effects were estimated as the mean = s.e.m. 
for all years in which complete water chemistry was available (usually n = 17). 
Finally, these calculations were repeated for the interval between ice formation and 
the pH minimum observed in situ, usually in mid-March. In all situations, mea- 
sured solution chemistry was inputted directly into the Geochemist’s Workbench 
React module, and the effect of CO, additions on pH were determined by incre- 
mentally adding CO; to the modelled solution and iteratively calculating the dis- 
tribution of aqueous species until they met constraints imposed on the system by 
charge and mass balance and aqueous species equilibrium constants. 

Effects of ice melt on spring CO) efflux and lake-water pH were also calculated 
using Geochemist’s Workbench by incrementally subtracting CO, lost to the atmo- 
sphere from the aqueous solutions observed at the time of ice melt. In this case, 
annual time series were each standardized to the date of complete ice melt (week = 0), 
and changes in aqueous chemistry were analysed at weekly intervals from 1 month 
before ice melt (week = —4) to one month after ice melt (week = +4). Here observed 
declines in concentrations of total inorganic carbon (TIC) and its predominant anion 
(HCO, _) during spring were assumed to result from the interaction of several 
competing processes including CO, evasion, precipitation of CaCOs, influx of 
inorganic C in runoff and dilution of TIC by meltwater. Unique effects of dilution 
were estimated from an analysis of the percentage decline in concentration of con- 
servative ions, chloride (Cl) and fluoride (F  ) during the two weeks after ice melt, 
whereas precipitation of CaCO; was evaluated on the basis of changes in concentra- 
tion of CO; and temporal variation in the CaCO; saturation index. After allowing 
for these processes, residual decline in TIC concentration was used as an estimate 
of the maximum possible loss of C to the atmosphere. This calculated CO, loss was 


incrementally subtracted from the fluid chemistry observed at the date of ice melt 
(week = 0) and its effect on lake-water pH was evaluated by comparing the observed 
and calculated pH changes during the two weeks after ice melt. Estimates of the 
magnitude of CO -induced pH change and unreplicated linear regression were used 
to evaluate the importance of vernal CO, efflux in potentially decoupling spring and 
summer pH. 

Changes in partial pressure of CO} (pco, ) and CaCO; saturation index (Qcaco, ) 
were calculated for the two-month interval bracketing the date of ice melt by input- 
ting fluid chemistry recorded at 1.5 m depth for all relevant samples and solving the 
distribution of species, as discussed above. pco, was calculated according to 
_ ACO,(aq.) 


Pco, 


Keo, 


where aco, (aq, is the activity of CO,(aq.) in solution (assumed to be equivalent to 
its molality) and Kco, is the Henry’s Law constant for CO, dissolution. Qcaco, was 
calculated according to 


Acy2+ a co,?- 
Q¢eaco,; = —=—*- 
Kealcite 
eas a = . 
where dc,2+ and ag, > are the activities of Ca”* and CO3”_ , respectively, calculated 
3 


by multiplying the iteratively determined species molality by its activity coefficient, 
and Katcite is the equilibrium constant for calcite dissolution obtained from the ther- 
modynamic database. Estimates of pco, derived from Geochemist’s Workbench 
were very highly correlated (17 = 0.998, P < 0.0001) with those obtained using pro- 
tocols in refs 9 and 10. 
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Extended Data Figure 1 | Map of study region in Saskatchewan, Canada. 
Hardwater lakes of the Qu’Appelle catchment (triangles) were monitored every 
two weeks from May to September during 1995-2010 (ref. 19), and closed- 
basin lakes were monitored monthly (open circles) or annually (filled circles) 
during 2002-2010 (except 2006)’. Weekly monitoring of pH occurred at 
Buffalo Pound Lake (black triangle) during 1979-2007. All lakes are situated in 
prairie grassland ecozones with pronounced precipitation deficits (annual 
precipitation — potential evaporation) of 40-60 cm yr | (dashed lines)!*”°. 
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Extended Data Figure 2 | Principal components analysis of the relationship 
between mean annual surface water pH, annual meteorological conditions 
and mean summer lake parameters during 1995-2010. a, Ordination of 
mean summer pH in Qu’Appelle lakes (n = 6) during 1 May to 31 August in 
relation to mean annual meteorological conditions revealed that pH was 
correlated positively with mean annual and spring (not shown) temperatures, 
correlated negatively with the date of ice melt, and was unrelated to 

mean annual levels of precipitation or irradiance. Variables include 
log)o-transformed mean annual temperature (temperature), total annual 
rainfall (rain), total annual precipitation (precipitation), total snowfall (snow), 
untransformed daily hours of bright sunlight (irradiance) and the calendar 
day of the year when ice was completely melted from the lake surface (ice melt 
date). b, Ordination of mean summer pH in relation to coeval chemical, 
hydrological and physical conditions in Qu’Appelle lakes, as well as indices of 
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relevant global climate systems. Abbreviations include water temperature 
(T°Cyq29), total inorganic carbon (TIC), dissolved organic carbon (DOC), 
total dissolved nitrogen (TDN), log,o-transformed chlorophyll a (Chl), 
conductivity (Cond), log;o-transformed soluble reactive phosphorus (SRP), 
logio-transformed total dissolved phosphorus (TDP), log)o-transformed 
dissolved ammonia/ammonium (NH,), turbidity (Secchi depth), 
logio-transformed volume of river inflow (inflow), dissolved oxygen (O2), 
log;o-transformed dissolved nitrite + nitrate (NO) and climate indices 
representing the Pacific Decadal Oscillation (PDO), the North Atlantic 
Oscillation (NAO) and the winter (SOl winter) or annual (SOI pean) Southern 
Oscillation Index. This analysis reveals that mean summer pH is correlated 
positively with the PDO and negatively with the SOI, consistent with the 
interpretation that warm winters and reduced ice cover result in higher summer 
pH in Qu’Appelle lakes. 
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(1985-2003) 
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Extended Data Figure 3 | Elastic net analysis to identify and rank predictors 
of changes in under-ice pH in Buffalo Pound Lake during winter. Water 
quality parameters at 1.5m above the lake bottom were analysed weekly using 
uniform methods during 1985-2003 from the date of ice-cover formation to 
the date of ice melt. Analysis was performed using 125 weekly observations with 
complete water chemistry. Parameters include concentrations of dissolved 
oxygen (O2), sodium (Na‘), carbonate (CO;” _), log.-transformed dissolved 
aluminium (log.Al), fluoride (F ), potassium (K*), log.-transformed 
orthophosphate (log.PO,° ~), calcium (Ca?*), dissolved magnesium (Mg*), 
log.-transformed nitite + nitrate (log.NO3_), log.-transformed dissolved 
manganese (log.Mn), bromide (Br _), total phosphorus (TP), dissolved iron 
(Fe), chloride (Cl), log.-transformed ammonium (log.NH4"), bicarbonate 
(HCO;,_), sulphate (SO, ) and temperature (temp). Coloured lines indicate 
how standardized regression coefficients (y axis, left) develop (right to left) 

as the initial pool of predictors (y axis, right) is refined by removing collinear 


and non-significant variables. Evaluation of the standardized coefficients of the 
most parsimonious model (vertical dashed line; equation under graph) 
demonstrates that changes in microbial metabolism (O, decline X respiratory 
quotient of 1.2 = CO, production)*"' was the main factor regulating variation 
in water-column pH under ice, showing a nearly fourfold greater coefficient 
(0.14) than did either HCO;~ or Ca?* (0.04). Although dissolved log.Al, 
log.NH4* and CO37~ were also significant predictors of changes in winter pH 
(standardized coefficients 0.03—-0.07), concentrations of these solutes 

(means = s.e.m.; 1 = 17) were too low (<0.01 M) to regulate lake-water pH 
relative to the effects of changes in O (0.62 M), HCO; (3.69 M) or Ca** 
(1.22 M). This analysis suggests that metabolically produced CO, mainly 
regulates variation in winter pH by the production of carbonic acid (reduces 
pH), but that pH decline is slightly tempered by CO3-induced dissolution 

of sedimentary CaCO3, the main form of sedimentary carbon in Buffalo 
Pound’. 
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Extended Data Figure 4 | Effects of ice melt on water chemistry and spring 
carbon efflux from Buffalo Pound Lake, 1985-2003. a, Mean pH recorded at 
1.5m above the lake bottom from four weeks before to four weeks after ice 
melt (week = 0). The rate of pH increase was linear with time (7 = 0.96, 
P<0.0001) with a slightly higher magnitude of increase (0.28 units) occurring 
in the two weeks after ice melt. b, Changes in mean concentration (mg C1’) 
of total inorganic carbon (TIC) during spring. The maximum extent of TIC 
decline (6.57 mg Cl '; 14.6% of TIC stock at week 0) occurred during the two 
weeks after ice melt as a result of a combination of CO, efflux (0.86 mgCl') 
and dilution by snowmelt (5.71 mgCl1_'), as documented in d. c, Changes 

in concentrations of HCO; and CO,” , and logio of calculated CaCO3 
saturation index. Patterns reveal that the decline in TIC was due to a decrease 
in HCO; concentrations, rather than to the precipitation of CaCO3. 

d, Changes in mean concentrations (mg 1!) of chloride (Cl ) and fluoride 
(F_) during spring. Because concentrations of conservative tracers declined by 
an average of 12.7% in the two weeks after ice melt, yet TIC declined by 14.6%, 
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we estimated that 1.9% of the decline in TIC stock at week 0 (0.86 mg C1 ~ ') was 
caused by loss of inorganic C, particularly atmospheric evasion. Geochemical 
modelling of water chemistry observed at week 0 demonstrated that this 
magnitude of CO, loss should increase the pH by 0.23 + 0.08 units by week 2, a 
value equivalent to the observed increase in pH (see a). e, Observed changes 
in water temperature were too small (1.7 °C) to influence water chemistry 
strongly during the two weeks after ice melt. f, Calculated changes in chemically 
enhanced CO), efflux modelled with observed water chemistry. Ice cover was 
assumed to prevent potential atmospheric CO, exchange (grey shading), 
whereas most CO, efflux seemed to occur one to two weeks after ice melt. All 
water samples were collected weekly at 1.5 m above the bottom of 3.0-m-deep 
Buffalo Pound Lake. Error bars represent s.e.m. (1 = 17). During each year, 
sampling intervals were standardized to the documented week of ice melt 
(week = 0) before the calculation of long-term means. Changes in CaCO; 
saturation and water-column pco, were modelled with observed water-column 
parameters and Geochemists Workbench version 9.0.9 (see Methods). 
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Extended Data Figure 5 | Relationship between surface water pH in samples)”’. b, Seasonal change in mean surface water pH of 15 closed-basin 
hardwater lakes of the Qu’Appelle River catchment and hydrologically lakes monitored monthly during 2002-2009. Error bars represent one s.e.m. 
closed lakes of southern Saskatchewan. a, Qu’Appelle lakes (n = 6) were These patterns demonstrate that the pH of closed-basin lakes varied 
monitored every two weeks during summer 1995-2010 (ref. 19); closed-basin synchronously with that of Qu’Appelle lakes on both annual and seasonal 
lakes (n = 20 in most years) were sampled monthly or seasonally (spring, scales, despite large differences in hydrological properties'”””. 


summer and autumn) during 2002-2009 (except 2006, when there were no 
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Extended Data Figure 6 | Global map of regions where climatic conditions 
and soil types resemble those of southern Saskatchewan, Canada. Hardwater 
lake distribution is not well quantified; however, this map depicts the region 
in which subsoil composition favours hardwater lakes and where climatic 
conditions produce substantial winter ice cover. Soil data originate from the 
FAO-UNESCO Soil Map of the World; regions highlighted in black have 
subsoil concentrations of CaCOs in excess of 10% (for example Cambisol, 
Xerosol, Yarsol, Kastanozem and Chernozem soils). These data were overlain 
with temperature data (10 arcminute resolution, averaged monthly during 
1950-2000) obtained from the WorldClim Global Climate Data that were 
restricted to regions where the monthly average temperature was below 0 °C 
for December-February (June-August for the Southern Hemisphere) but 
where the temperature was above 0 °C in October (April in the Southern 


Hemisphere), to exclude high-latitude lakes with permanent ice cover. The 
highlighted area (15,200,000 km’) has pronounced winter and calcareous soils, 
spanning the prairie and steppe regions of North America, South America, 
Europe and Asia. If we assume this region to have a similar surface water 
distribution to that of southern Saskatchewan, the area occupied by permanent 
lakes should be between 740,678 km? (at 1:50,000 scale) and 538,892 km? 

(at 1:250,000). If these basins also experienced a decline in CO, efflux of 

100 gCm ~~ per summer during the past 15 years (Fig. 1d), global hardwater 
lakes may have sequestered 74.1 Mt (53.9 Mt at the coarser resolution) more C 
per summer than they did during the mid-1990s, a change greater than 5% 
of global efflux from dilute boreal lakes**. This value should increase in the 
future as ice cover declines. 
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Extended Data Table 1 | Physical and chemical characteristics of hardwater lakes of the Qu’Appelle River catchment, Saskatchewan, Canada 


Lake 

Diefenbaker = Buffalo Pound Last Mountain Wascana Katepwa Crooked 
Area (km) 500 29.1 226.6 0.5 16.2 15.0 
Volume (m* 10°) 9400 87.5 1807.2 0.7 233.2 120.9 
Mean depth (m) 33.0 3.0 7.9 1.5 14.3 8.1 
WRT (yr) 1.3 0.7 12.6 0.7 1.3 0.5 
TDP (ug PL") 20.9 27.8 44.9 321.6 152.3 128.8 
PO*(ugPL*) 9.7 35.4 23.6 215.8 99.7 83.3 
TDN (ugNL*) 421.6 511.7 990.3 1441.2 1152.6 948.4 
NOs; (ug NL") 171.3 77.3 61.6 156.4 213.4 93.0 
NH4* (ugNL") 18.1 32.7 28.1 77.5 74.4 29.7 
DOC (mgCL") 68 75 16.4 17.9 13.7 13.3 
DIC(mgCL") 33.6 32.3 57.9 40.2 48.6 49.8 
Cond (uSem") 411.0 468.7 1776.2 900.3 1135.5 1210.7 
Chl a ( yg L*) 5.4 30.9 17.2 40.8 26.2 31.3 
pH 8.7 8.7 8.8 9.0 9.0 8.8 


Mean summer values (May—August, 1995-2010; n = 128) include water residence time (WRT), total dissolved phosphorus (TDP), orthophosphate (PO43>), total dissolved nitrogen (TDN), nitrate (NO3_), 
ammonium (NH, *), dissolved organic carbon (DOC), dissolved inorganic carbon (DIC), conductivity (Cond) and algal abundanceas chlorophyll a (Chl a). Data are from previous papers®?? and unpublished records 
of P.R.L. 
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Spatiotemporal transcriptomics reveals the 
evolutionary history of the endoderm germ layer 


Tamar Hashimshony’, Martin Feder!, Michal Levin!, Brian K. Hall? & Itai Yanai! 


The concept of germ layers has been one of the foremost organizing 
principles in developmental biology, classification, systematics 
and evolution for 150 years (refs 1-3). Of the three germ layers, the 
mesoderm is found in bilaterian animals but is absent in species in 
the phyla Cnidaria and Ctenophora, which has been taken as evid- 
ence that the mesoderm was the final germ layer to evolve’**. The 
origin of the ectoderm and endoderm germ layers, however, remains 
unclear, with models supporting the antecedence of each as well as a 
simultaneous origin**°. Here we determine the temporal and spatial 
components of gene expression spanning embryonic development 
for all Caenorhabditis elegans genes and use it to determine the evo- 
lutionary ages of the germ layers. The gene expression program of the 
mesoderm is induced after those of the ectoderm and endoderm, thus 
making it the last germ layer both to evolve and to develop. Strik- 
ingly, the C. elegans endoderm and ectoderm expression programs 
do not co-induce; rather the endoderm activates earlier, and this is 
also observed in the expression of endoderm orthologues during the 
embryology of the frog Xenopus tropicalis, the sea anemone Nema- 
tostella vectensis and the sponge Amphimedon queenslandica. Query- 
ing the phylogenetic ages of specifically expressed genes reveals that 
the endoderm comprises older genes. Taken together, we propose 
that the endoderm program dates back to the origin of multicellu- 
larity, whereas the ectoderm originated as a secondary germ layer 
freed from ancestral feeding functions. 

Embryonic development in C. elegans begins with a series of asym- 
metric cell divisions producing five somatic founder cells (AB, MS, E, 
C, D), each giving rise to a limited number of tissue types, and a single 
germline founder cell (P4) (Fig. 1a)’°. To determine globally the spatio- 
temporal gene expression in the C. elegans embryo, we isolated five blas- 
tomeres (AB, MS, E, C and P3) that collectively amount to the entire 
embryo and cultured them in vitro" to obtain a time course (Fig. la 
and Extended Data Fig. 1). The blastomeres divided well in vitro, main- 
taining the expected relative division rates: all AB cells maintained a 
synchronized division rate, while E divided slower than MS (Extended 
Data Fig. 1). We analysed the transcriptomes of these collected blasto- 
meres using our recently described cell expression by linear amplifica- 
tion and sequencing (CEL-seq) method” for performing single-cell 
RNA sequencing’*"*. To assay the degree to which the cultured blasto- 
meres exhibit the expected expression, we also generated a whole-embryo 
CEL-seq time course, spanning the one-cell stage to the free-living larva, 
at 10 min resolution up to muscle movement, and then roughly every 
30 min (Fig. 1a). 

The quality of the data set was assessed in several ways. First, an av- 
erage Pearson’s correlation coefficient of the biological replicates of 0.9 
indicates both that the blastomeres follow similar paths as they differ- 
entiate in isolation and that the CEL-seq method is reproducible (Extended 
Data Fig. 2a). Second, we compared the whole-embryo transcriptomes 
with a weighted sum of the time courses of the five lineages (Fig. 1b), 
and found that the blastomere data mirror the gene expression of the 
whole embryo, at the expected times (circles in Fig. 1b). Third, we show 
that the overall differentiation in vitro is intact, as the blastomere lin- 
eages express the expected differentiation events (Fig. 1c). Finally, we 


found that these profiles compared well with a previously published set 
of embryonic expression profiles’* (Extended Data Fig. 2e and Supplemen- 
tary Table 1). Our data reveal the spatial and temporal expression pro- 
file for each gene (Fig. 1d). For example, unc-120/SRF has expression 
in MS, Cand P3, as expected from its known role as a myogenic master 
regulator’®. 

Since the five lineages each develop in isolation from one another, 
their context in the embryo is lost and, consequently, absence of signal- 
ling between cell lineages must affect some gene regulation. Most no- 
ticeably, the specification of the pharynx in the AB lineage is dependent 
upon two Notch signalling events'’ and indeed we do not see express- 
ion of pharyngeal specification genes in the AB lineage (Extended Data 
Fig. 3a). Thus, although we found that for some genes expected levels 
are maintained (for example wrm-1, a B-catenin-like protein, pal-1/caudal 
and pie-1, a zinc-finger protein; Fig. 1d), for some genes expression is 
higher than in the whole embryo (flp- 15; Fig. 1d) and for others expres- 
sion is at lower levels (ceh-27, a homeodomain protein and Y41D4B.26; 
Fig. 1d). We found a general coherence between the time courses: 82% 
of the genes are within one log, unit difference (Extended Data Fig. 3b). 
Of the genes that do differ, we found a strong bias for genes with lower 
expression in the blastomere time course as opposed to higher express- 
ion. For 380 genes expressed in the whole-embryo time course, we de- 
tected no expression at all in the blastomere time courses (Supplementary 
Table 2; see, for example, C55B7.3 in Fig. 1d). Genes with ‘missing’ 
expression tend to be expressed late in development (Extended Data 
Fig. 3f), indicating that, although in earlier development very few genes 
are unaccounted for in the data set, by the end of the time course notice- 
able deviations from standard development are apparent. 

Performing principal component analysis on the blastomere tran- 
scriptomes distinguished the three germ layers (Fig. 2a). The three prin- 
cipal components collectively explained 41% of the variation in gene 
expression across the five lineages. The first principal component (PC1) 
correlated with developmental time, reflecting the expression of genes 
with non-specific expression (Extended Data Fig. 4). In general, PC2 
distinguished the endoderm while PC3 distinguished ectoderm from 
mesoderm (Fig. 2a). The C lineage clusters with the other mesodermal 
lineages, although it produces both muscle and epidermis, probably be- 
cause it contains twice as many muscle cells as epidermal cells'®. The 
overall distribution of the time courses into germ layers provides evid- 
ence for their distinction at the transcriptomic level. 

To identify the specific genes uniquely expressed in each germ layer, 
we computed the correlation of the expression profile of each of the dy- 
namically expressed genes to all others, and clustered them using hier- 
archical clustering (Fig. 2b). We detected 25 clusters, each comprising 
at least 10 genes. Gene members in a given cluster tended to have the 
same timing and location of expression (Fig. 2b, see right-hand bars). 
Fifty-four per cent of dynamically expressed genes are not specific to 
particular lineages (Fig. 2b), with nearly half deriving from the maternal 
transcriptome. The dynamically expressed genes with lineage specifi- 
city were divided according to their germ layer of expression (Extended 
Data Fig. 5), while further requiring each germ-layer annotated gene to 
have at least two-thirds of its expression in that germ layer (Supplementary 
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Table 4) . Mapping these to their time of induction in the whole embryo, 
we found that germ-layer-specific expression increases with develop- 


mental time (Fig. 2c). Moreover, different germ layers initiate their 
programs at different times: first the endoderm, then the ectodermal 


expression and finally the mesodermal expression (Fig. 2c). This gen- 


eral pattern is also reflected when examining the dynamics of the germ 


layers through their average expression of the genes (Fig. 3). 
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Figure 2 | Dynamics of germ-layer gene expression throughout 


development. a, Principal component analysis on dynamically expressed 
genes for the five lineage time courses (see Extended Data Fig. 4 for principal 


component 1). Adjacent stages of the same lineage are connected by a line; the 


terminal stage is indicated by a circle. b, Heat map indicating Pearson’s 
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Figure 1 | Determining the expression profiles of 
the C. elegans embryonic founder cell lineages. 
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(Extended Data Table 3 and Extended Data 

Fig. 2d). The white circles indicate the expected 
differentiation of each expression cluster 
(Extended Data Table 1). d, Spatial and 

temporal gene expression profiles. 


The dynamics of the germ-layer expression programs may be uni- 
que to C. elegans or a general property of animal development. To test 
this, we analysed the previously characterized transcriptomes of the dis- 
tantly related species X. tropicalis'*, N. vectensis’® and A. queenslandica”. 
For each species, we mapped the orthologues of the C. elegans germ- 
layer genes in the respective genome and computed their average de- 
velopmental expression profiles. We found a general recapitulation of 
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the order found in C. elegans (Fig. 3). The onset of the endodermal pro- 
gram in Xenopus occurs during gastrulation, well before that of the ecto- 
dermal and mesodermal programs (P < 0.01, Kolmogorov-Smirnov 
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test). In Nematostella, we also detected a major rise in the expression of 
endoderm orthologues during gastrulation (P< 10” *). The observation 
that mesoderm orthologues in Nematostella are expressed in the planula 
is consistent with the notion that the bilaterian mesoderm was co-opted 
from late-expressed genes. In Amphimedon, endoderm orthologues are 
enriched for expression during the ‘brown’ stage, in which two layers first 
become visible. Expression of the orthologues of the ectoderm and me- 
soderm germ-layer genes, in contrast, is seen only in the early stages (P 
<10 *), reflecting that they are solely deposited as maternal transcripts. 

The distinct and conserved temporal inductions of germ-layer-specific 
expression (Fig. 3), with the mesoderm both appearing last in evolu- 
tionary timescales and developing last in the embryo, support accretion 
of processes as a mechanism in the evolution of development’. Ex- 
tending this reasoning to the endoderm suggests that it originated before 
the ectoderm. According to this scenario, the endoderm is expected to 
express genes of older origin. To test this, we studied gene ages using the 
phylostratigraphy approach, which infers a gene’s age from the phylo- 
genetic breadth of its orthologues”’. For a set of temporal stages, we 
computed for genes dynamically expressed at those times the fraction 
having orthologues in non-metazoan opisthokont eukaryotes. Using 
this analysis, we found that genes expressed in mid-development are 
generally of older origin than those expressed at other embryonic stages 
(Fig. 4a and Extended Data Fig. 6), consistent with previous analyses”. 
Examining the evolutionary age of the individual germ layers, we found 
that genes specifically expressed in the endoderm have a significantly 
higher fraction of older genes (P< 10°, 7” test). In contrast, the ecto- 
derm and mesoderm genes are significantly younger (P< 107°, 7’ test). 

Since the phylogenetic analysis revealed that endoderm genes com- 
prise genes of older origin, we enquired into their functional properties. 
We found that endoderm-specific genes are enriched for energy pro- 
duction, metabolism and transport functions (Fig. 4b and Extended Data 
Fig. 7). The observation that the endoderm is enriched in general feed- 
ing functions suggests that it is closer, relative to the ectoderm, in its 
characteristics to the choanoflagellate-like ancestor. To test this, we 


Figure 4 | The germ layers exhibit distinct gene ages and functional category 
enrichments. a, Fraction of ‘old’ genes—defined as presence of orthologues 
in other opisthokont eukaryotes—across the indicated temporal induction 
clusters and germ layers. Different gene age thresholds show similar results 
(Extended Data Fig. 6). b, For the functional categories shown, the bars indicate 
the fraction of genes in the endoderm gene set, ectoderm gene set, and other 
dynamic and zygotically expressed genes. Asterisks indicate significant 
endoderm (green) and ectoderm (blue) enrichments (P < 0.01, hypergeometric 
distribution). c, The fraction of orthologues in M. brevicollis is indicated for 
each functional category. d, A model for germ-layer evolution. 
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examined the level of orthology with the choanoflagellate Monosiga 
brevicollis** for each of the functional classes. Indeed we found a higher 
fraction of M. brevicollis orthologues in endoderm-enriched functional 
classes, such as transport and metabolism (Fig. 4c), suggesting that the 
endoderm is most closely aligned with the feeding capabilities of the 
free-living choanoflagellates. Moreover, while transport and metabol- 
ism appear to be related to ‘housekeeping’ functions, we observe, in 
contrast, that they are induced early on in embryogenesis in the endo- 
derm germ-layer program. 

Our results shed light on the evolutionary history of the endoderm 
germ layer (Fig. 4d). At the dawn of the metazoans, choanoflagellate-like 
colonial organisms comprised individual cells that probably all retained 
feeding functions. However, with the evolution of epithelial cells, the 
possibility of distinct cell-types emerged, as cells could communicate by 
strong membrane connections. Our analysis of the composition and dy- 
namics of the germ-layer transcriptomes leads us to propose that the 
endoderm program has retained the feeding functions of its choano- 
flagellate-like ancestor. Expression in the Amphimedon sponge is infor- 
mative since physical layers of epithelia** exist in this organism. The 
expression of sponge orthologues of the endoderm gene set suggests that 
Amphimedon only has a functional ‘proto-endoderm’ germ layer. This 
is also supported by recent evidence that the GATA gene in Amphi- 
medon is expressed in the internal layer in the sponge”. 

In the lineage leading to the eumetazoans, the transport and meta- 
bolic functions performed by internal cells may have allowed the exter- 
nal cells to specialize into an ectodermal germ layer (Fig. 4d). In this 
model, the ancestry of the endoderm follows from its role in feeding, 
whereas only later in evolution was it coupled with its current function 
as the gastrulating internal layer. This scenario is in line with Haeckel’s 
gastrea hypothesis’””* which posits a layered spherical organism as the 
urmetazoan. However, our model of feeding processes driving selec- 
tion of the endodermal identity is also consistent with an ancestral flat- 
tened placula, as proposed by Biitschli’””*, that subsequently evolved into 
a two-layered stage where the lower epithelia specialized in digestion. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Blastomere isolation and culturing. Egg shells were removed from C. elegans 
embryos and the resulting blastomeres cultured as previously described''. The egg 
shell and vitelline membrane were removed at the two-cell stage, and the embryo 
separated to the AB and P1 blastomeres by pipetting. P1 was allowed to undergo 
one cell division and separated to EMS and P2, or two cell divisions before being 
separated to the MS, E, C and P3 blastomeres, to allow the Wnt signalling from P2 
to EMS (Extended Data Fig. 1)’. The five lineages were cultured in a humid cham- 
ber in EGM"', and division of the E blastomere was used as a clock (Extended Data 
Table 2). All lineages from a single embryo were frozen at the same time. Indivi- 
dual samples were transferred with a micro-pipette into a 0.5 ul drop of egg salts 
placed on the cap of a 0.5 ml Lobind Eppendorf tube, excess liquid was aspirated off, 
and the samples frozen in liquid nitrogen. Samples were stored at — 80 °C. Samples 
were collected in triplicates; correlations between replicates are shown in Extended 
Data Fig. 2a. Throughout this work, ‘correlation’ denotes Pearson’s correlation 
coefficient. 

Whole-embryo time course. Precisely staged single embryos were collected at the 
one-, two- and four-cell stages, and 10 min intervals thereafter up to muscle move- 
ment, then roughly every 30 min; 50 embryos were used in total. RNA from each 
embryo was prepared using TRIzol as previously described’ with one modifica- 
tion: 1 tl of the ERCC spike-in kit” (1:500,000 dilution) was added with the TRIzol 
to each sample. 

Single cell and whole-embryo transcriptomics. CEL-seq’’ was used to amplify 
and sequence both RNA from the whole embryos and the cultured blastomeres. 
For the whole embryos, RNA was re-suspended in 5 il water and 1 il primer added; 
1.2 pl were taken for the amplification. For the blastomeres, 1 pil of a 1:500,000 
dilution of the ERCC spike-in kit and 0.2 il of the primer were mixed (a total of 
1.2 pl) and added directly to the lid of the Eppendorf tube where the cell was frozen. 
Linear amplification and library preparation were as previously described’’. Libraries 
were sequenced on an Illumina HiSeq2000 according to standard protocols. Paired- 
end sequencing was performed, reading at least 11 bases for read 1, 35 bases for read 
2, and the Illumina barcode when needed. The complete data set has been deposited 
in the Gene Expression Omnibus database under accession number GSE50548. 
Expression analysis pipeline. Transcript abundances were obtained from the se- 
quencing data as previously described’’. Briefly, libraries were sequenced on an 
Illumina HiSeq2000 according to standard, paired-end sequencing, using the CEL- 
seq protocol”. Mapping of the reads used BWA”, version 0.6.1, against the C. elegans 
WBCel215 genome (bwaaln -n 0.04 -o 1 -e-1-d 16 -i5-k2-M3-O11-E4). Read 
counting used htseq-count version 0.5.3p1 defaults, against WS230 annotation exons. 
The counts were normalized by dividing by the total number of mapped reads for 
each gene and multiplying by 10°, yielding the estimated gene expression levels in 
transcripts per million (t.p.m.). 

Warped whole-embryo time course. The whole-embryo time course (Extended 
Data Fig. 2c) was compared with the blastomere time courses (Fig. 1b) using a re- 
stricted set of 4,527 genes with a log, fold-change of at least 5 across the 50-embryo 
time course, greater than 100 t.p.m. maximum expression, and less than 10 t.p.m. 
minimum expression. These cutoffs were used to limit analysis to only the most 
dynamically expressed genes given the distinct dynamics of the whole-embryo time 
course. The minimum expression threshold further selected for temporally restricted 
expression. For each blastomere time point, the five lineages were summed up to 
represent the whole embryo, taking into account the fraction of the whole embryo 
represented by the specific lineage (half for AB, one eighth each for E, MS, C and 
P3). An eleven-stage warped whole-embryo time course was generated by taking 
for each stage a weighted average across the 50 embryos based upon the correla- 
tions with the blastomere time course, raised to the tenth power. Different defini- 
tions of this set resulted in very similar warped profiles. 

Spatial and temporal gene expression profiles. In the profiles shown in Fig. 1d, 
the log expression is split among the lineages according to the fraction in the natural 
scale expression. The black line indicates the expression of the whole-embryo time 
course. 

Definition of gene sets for dynamically expressed and differentiation genes. 
The 3,910 dynamically expressed genes were defined based upon the warped whole- 
embryo time course with >3 log, fold-change, >10 t.p.m. maximum expression 
and <100 t.p.m. minimum expression (Extended Data Fig. 2b). These parameters 
were adapted to the warped time course, which is less dynamic owing to averaging 
effects. ‘Constitutively expressed’ genes (Extended Data Fig. 3b) were defined as 
highly expressed genes (>500 t.p.m. maximum expression) but not members of 
the dynamically expressed genes. ‘Expressed genes’ (Extended Data Fig. 3b) were 
defined as those with >10 t.p.m. maximum expression. The differentiation gene 
sets (Fig. 1c and Extended Data Fig. 2d) were generated for each group—neurons 
(AB), muscle (MS, C and P3), endoderm (E), epidermis (AB and C), pharynx (MS) 
and germline (P3)—by examining terminal expression in the time courses. Genes 
were assigned to one of the seven sets if they exhibited expression =50 t.p.m. in 
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that group and a correlation coefficient greater than 0.7 of expression across the 
lineages with the expected expression pattern, as highlighted in red on the lineage 
trees. The parameters were set according to their definition of similarly sized sets. 
Clusters of temporal gene expression patterns. A correlation coefficient was com- 
puted for each gene’s temporal warped whole-embryo time course against each of 
17 idealized expression profiles (Extended Data Fig. 3c). The idealized profiles were 
constructed based upon average expression of clusters using the k-means algorithm 
and represent the general patterns of the transcriptome. The idealized profiles are 
vectors of the same length (11) as the warped time-course profile but with digital 
expression of three possible values: 0, 1 and 2. Each dynamically expressed gene was 
then assigned to the idealized profile to which it best correlateds. Seven of the 17 
idealized profiles correspond to ‘maternal’ profiles (Extended Data Fig. 3c) in which 
expression is initially high and then drops. We collapsed these seven profiles to one 
profile and denoted it as the ‘0’ cluster in Fig. 2b. 

Hierarchical clustering and definition of germ-layer genes. Hierarchical clus- 
tering used the ‘linkage’ function in MATLAB using the unweighted centre of mass 
distance (UPGMC) algorithm. The top 20 clusters with at least ten genes were ex- 
amined (Fig. 2b). Clusters with at least 65% of the genes of the same germ layer 
contributed their genes with the dominant germ layer. Germ layers were assigned 
by correlating the average expression with germ-layer-specific patterns with a cutoff 
of 0.6 correlation with the following idealized vectors: endoderm = [00100]; ecto- 
derm = [10000]; mesoderm = [01011], where the order is AB, MS, E, C and P3. 
Germ-layer genes were defined according to the sum of the genes identified by the 
clusters and are indicated in Fig. 2b. We further filtered the germ-layer gene sets by 
keeping only those genes whose expression was partitioned across the germ layers 
such that at least two-thirds of the expression was in that germ layer. 

Gene age. Orthologies were retrieved from the MetaPHoRs project using the 2010 
release’. Taxonomies were retrieved from the NCBI Taxonomy. For each C. elegans 
gene, if the gene was also present in at least 25% of the examined non-metazoan 
ophistokont eukaryotes, it was annotated as ‘old’. Similar results were also observed 
for the definition of ‘old’ genes at the level of eukaryotes and cellular life (Extended 
Data Fig. 6). MetaPHoRs were also used to delineate the orthologies shown in Fig. 4c 
for M. brevicollis. 

Orthologous gene expression profiles. The developmental time courses of A. 
queenslandica, X. tropicalis and N. vectensis have been previously described'*”°. 
For these species, the latest protein annotations were used to detect orthologies as 
follows: A. queenslandica, Aqu2; X. tropicalis, JGI_4.2; N. vectensis, GCA_000209225. 
A. queenslandica orthologies were delineated using OrthoMCL*, and those of X. 
tropicalis and N. vectensis were retrieved from Biomart* which contained the anno- 
tations on the noted versions. We included in the analysis genes whose maximum 
expression was greater than the data-set-specific threshold; this was computed as 
the median average expression across all genes. Expression profiles passing this 
threshold were each normalized to their own maximum expression. A Kolmogorov- 
Smirnov test was used to test for significantly different temporal dynamics between 
endoderm and ectoderm expression. For this analysis the timing of expression for 
each gene was computed as the stage at which half of the sum expression had 
occurred. 

Functional categories analysis. COG” functional category annotations were 
retrieved from WormMart*. For simplicity, annotations of ‘general function predic- 
tion only and ‘function unknown’ were ignored, as well as those categories capturing 
fewer than 3% of the genes. Enrichments were computed using the hypergeometric 
distribution. 
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Extended Data Figure 1 | In vitro culturing of the C. elegans embryonic micrographs on the right. The numbers indicate the stages at which the 
founder blastomeres. The cells are separated as shown in the left cells were collected for transcriptome analysis. Six of the 11 stages are shown in 
schematic and then cultured in embryonic growth medium” as shown in the __ the micrographs. 
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Extended Data Figure 2 | A transcriptomic survey of C. elegans embryonic 
founder cell lineages. a, Replicates of the embryonic blastomere time courses. 
The heat maps show the correlations among the replicates for each blastomere 
lineage at each of the eleven examined stages. For three blastomere stages 
there were no replicates. The median correlation coefficient is 0.9. Samples were 
collected in triplicates. Only samples with at least 750,000 reads were used, 
which has been previously shown to be of sufficient sequencing depth for CEL- 
seq’’. Supplementary Table 3 provides the sequencing statistics for each sample. 
b, Expression profiles of the 3,910 dynamic genes across the blastomere lineage 
time courses. See Methods for definition of dynamic genes. c, Correlation 
coefficients between samples of the whole-embryo time course. Each of the 
50 samples comprises a single embryo, collected at the indicated minutes 
past the four-cell stage. Again, only samples with at least 750,000 reads were 
used and Supplementary Table 3 provides the sequencing statistics for each 
sample. d, The expression profiles of the 1,664 genes with differentiated 
expression analysed in Fig. 1c. Each profile was ‘standardized’ by subtracting its 
mean and dividing by its standard deviation. e, Comparison of the blastomere 
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time courses to the EPIC data set’. For 115 genes, we could compare gene 
expression to previously published embryonic expression profiles generated by 
microscopic lineaging until the ~300-cell stage’. Of these, 75% of our 
profiles had consistent localized expression (Supplementary Table 1). Of those, 
54% matched completely, and 21% of the genes expressed in all of the lineages 
in our data set had some missing expression in the EPIC data set because 

the lineaging was not performed until the end of the developmental process. 
The remaining genes have some overlap in expression. Such differences in 
expression could be caused by the transgene in the EPIC data set not 
recapitulating the profile of the endogenous gene, or missing signals between 
cells in the blastomere data set, as is seen from the whole-embryo/blastomere 
expression level ratio (see Supplementary Table 1, ratios defined as equal, 
slightly higher/lower or much higher/lower). Expression profile compared with 
the EPIC data set deviates more when expression in the blastomeres is low 
compared with the whole embryo, but the blastomere data set has the advantage 
that all genes are assayed simultaneously, no transgenes are used, maternal 
transcripts are seen and downregulation of genes is observable. 
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Extended Data Figure 3 | Lineage-restricted gene expression identifies 
genes dependent upon coherence of the lineages and tissue specificity. 

a, Expression profiles of genes involved in pharynx specification. The left and 
right panels correspond to the two Notch signalling events. The top and bottom 
images correspond to the expected regulatory patterns in the whole embryo and 
isolated blastomeres, respectively. The thx-37 gene is not shown since it is 
identical to tbx-38 in expression profile. b, Comparison of the overall sum of 
expression between the two time courses, plotted on a log, scale (black). Genes 
‘missing’ in the separated lineage time course were manually added to the graph 
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at —3. The additional plots indicate the same measure for dynamically 
expressed genes (blue) and constitutive genes (red). c, Idealized expression 
profiles used to identify gene expression clusters. d, The gene expression 
profiles for the temporally restricted gene expression profiles. Each profile was 
‘standardized’ by subtracting its mean and dividing by its standard deviation. 
e, Average expression profiles of ten clusters of dynamically expressed genes 
determined on the basis of the whole-embryo expression data (see Methods). 
f, The number of dynamic genes in each temporal period. In each group, the 
genes not expressed in the lineage time course (b) are marked in red. 
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Colour codes are the same as in Fig. 1. PC1, PC2 and PC3 capture 18%, 12% and maternal expression). 
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Extended Data Figure 5 | Germ-layer-specific expression. Expression Germ-layer-specific genes were identified by hierarchical clustering based upon 
profiles of the germ-layer-specific genes in each of the lineages. Thexand yaxes _ correlation among dynamically expressed genes (see Methods). 
are the 11 examined temporal stages and individual genes, respectively. 
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Extended Data Figure 6 | Robustness of gene age analysis. a, Same format as 
Fig. 4a but with the definition of old genes as those present in at least 25% of the 
examined eukaryotes (see Methods) that are not ophisthokonts. b, Same as 


Fig. 4a with a definition of ‘old’ as those present in 25% of the examined 
organisms that are not eukaryotes (Eubacteria and Archaea). 
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Extended Data Figure 7 | Truncated endoderm gene set control. To exclude 
the possibility that general genes were included as ‘endoderm-specific because 
the endoderm program is induced earlier, we excluded temporal clusters 8, 


9 and 10 from the endoderm genes and repeated the relevant analyses. 
We found that there was no marked change in the results. The results are shown 
in the same format as Figs 3 and 4b, c. 
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Extended Data Table 1 | The fates of the progeny of each blastomere in vivo and in isolated cultured blastomeres 


Fates in whole embryo '° Expected in References 
vitro 
AB Neurons Unknown 
Epidermis Yes 40 
Pharynx No A 
1 muscle cell Unknown 
MS Muscle Yes 42 
Pharynx Yes 42 
E Endoderm Yes 43,44 
Cc Muscle Yes 40 
Epidermis Yes 40 
P3 D Muscle Yes 40 
P4 Germ line Unknown 


Data are from refs 40-44. 
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Extended Data Table 2 | Description of the developmental stages queried in this study 
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Stage Stage name Description Time* 

number 

1 2-cell 2-cell embryo 0 

2 4-cell 4-cell embryo 20 

3 E After division of EMS to E and MS 40 

4 2E After division of E to Ea and Ep 60 

5 2E+ After division of MSa and MSp to MSaa, MSap, 90 
MSpa and MSpp 

6 4E After division of Ea and Ep to Eal, Ear, Epl and Epr 110 

7 4E+ 60 minutes after division of Ea and Ep to Eal, Ear, 140 
Ep! and Epr 

8 8E After division of Eal, Ear, Ep! and Epr to Eala, Ealp, 180 
Eara, Earp, Epla, Eplp, Epra and Eprp 

9 8E+ 90 minutes after division of Eal, Ear, Ep! and Eprto na 
Eala, Ealp, Eara, Earp, Epla, Eplp, Epra and Epr 

10 8E++ 180 minutes after division of Eal, Ear, Ep! and Epr na 
to Eala, Ealp, Eara, Earp, Epla, Eplp, Epra and Epr 

11 o.n. After an over-night incubation — more than 8 Ecells na 


are visible. 


« Timing of the stage in the Sulston lineage"’. Timing is indicated as minutes from the 2-cell stage. 
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Extended Data Table 3 | Tissue-specific gene sets 


Tissue 


Gene sets 


Neuronal 


Muscle 
Endoderm 


Epidermis 


Pharynx 


Germline 


Genes with the following GO terms: 


GO:0001764 
GO:0004983 
GO:0005328 
GO:0006836 
GO:0007218 
GO:0007268 
GO:0007411 
GO:0008021 
GO:0030424 
GO:0030425 
GO:0030594 
GO:0043005 
GO:0045202 
GO:0045211 
GO:0048489 
GO:0048666 


neuron migration 

neuropeptide Y receptor activity 
neurotransmitter:sodium symporter activity 
neurotransmitter transport 
neuropeptide signaling pathway 
synaptic transmission 

axon guidance 

synaptic vesicle 

axon 

dendrite 

neurotransmitter receptor activity 
neuron projection 

synapse 

postsynaptic membrane 
synaptic vesicle transport 

neuron development 


Genes identified by Fox et al. 


Genes identified by McGhee et al.*® 


Genes with the following GO term: 


GO:0018996 


molting cycle, collagen and cuticulin-based cuticle 


Genes with the following GO term: 


GO:0007631 


feeding behavior 


Genes with the following GO terms: 


GO:0051729 
GO:0048477 
GO:0045132 
GO:0043186 
GO:0007276 
GO:0007281 
GO:0007126 
GO:0001556 
GO:0000003 


germline cell cycle switching, mitotic to meiotic cell cycle 
oogenesis 

meiotic chromosome segregation 

P granule 

gamete generation 

germ cell development 

meiosis 

oocyte maturation 

reproduction 


Data for muscle and endoderm are from refs 45 and 46, respectively. 
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Large-scale discovery of novel genetic causes of 


developmental disorders 


The Deciphering Developmental Disorders Study* 


Despite three decades of successful, predominantly phenotype-driven 
discovery of the genetic causes of monogenic disorders’, up to half 
of children with severe developmental disorders of probable genetic 
origin remain without a genetic diagnosis. Particularly challenging 
are those disorders rare enough to have eluded recognition as a dis- 
crete clinical entity, those with highly variable clinical manifestations, 
and those that are difficult to distinguish from other, very similar, 
disorders. Here we demonstrate the power of using an unbiased 
genotype-driven approach’ to identify subsets of patients with sim- 
ilar disorders. By studying 1,133 children with severe, undiagnosed 
developmental disorders, and their parents, using a combination of 
exome sequencing "' and array-based detection of chromosomal 
rearrangements, we discovered 12 novel genes associated with devel- 
opmental disorders. These newly implicated genes increase by 10% 
(from 28% to 31%) the proportion of children that could be diagnosed. 
Clustering of missense mutations in six of these newly implicated genes 
suggests that normal development is being perturbed by an activating 
or dominant-negative mechanism. Our findings demonstrate the value 
of adopting a comprehensive strategy, both genome-wide and nation- 
wide, to elucidate the underlying causes of rare genetic disorders. 

Weestablished a network to recruit 1,133 children (median age 5.5 years, 
Extended Data Fig. 1a) with diverse, severe undiagnosed developmental 
disorders, through all 24 regional genetics services of the UK National 
Health Service and Republic of Ireland. Among the most commonly 
observed phenotypes (Extended Data Fig. 1b and Supplementary Table 1) 
were intellectual disability or developmental delay (87% of children), 
abnormalities revealed by cranial MRI (30%), seizures (24%), and con- 
genital heart defects (11%). These children are predominantly (~90%) 
of northwest European ancestry (Extended Data Fig. 1c), with 47 pairs 
of parents (4.1%) exhibiting kinship equivalent to, or in excess of, second 
cousins (Extended Data Fig. 1d and Supplementary Information). In 
most families (849 of 1,101) the child was the only affected family mem- 
ber, but 111 children had one or more parents with a similar develop- 
mental disorder, and 124 hada similarly affected sibling (Supplementary 
Information). Prior clinical genetic testing would have already diagnosed 
many children with easily recognized syndromes, or large pathogenic 
deletions and duplications, enriching this research cohort for less dis- 
tinct syndromes and novel genetic disorders. 

We sequenced the exomes of 1,133 children with developmental dis- 
orders and their parents, from 1,101 families, representing 1,071 unre- 
lated children and 30 sibships. We also performed exome-focused array 
comparative genomic hybridization (exome-aCGH) on the children 
(n = 1,009) and UK controls (m = 1,013), and genome-wide genotyp- 
ing on the trios (n = 1,006) to identify deletions, duplications, unipar- 
ental disomy and mosaic large chromosome rearrangements. From our 
exome sequencing and exome-aCGH data, we detected an average of 
19,811 coding or splicing single nucleotide variants (SNVs), 491 coding 
or splicing insertions and deletions (indels) and 148 copy number var- 
iants (CNVs) per child (Supplementary Information). From analyses 
of the genotyping array data’* we identified six children with unipar- 
ental disomy and five children with mosaic large chromosomal rear- 
rangements (Supplementary Information). The SNVs, indels and CNVs 


were analysed jointly in the following analyses, allowing, for example, 
the identification of compound heterozygous CNVs and SNVs affect- 
ing the same gene. 

We discovered 1,618 de novo variants (1,417 SNVs, 114 indels and 
87 CNVs) in coding and non-coding regions (Supplementary Tables 2 
and 3), of which 1,596 (98.6%) were validated using a second, independ- 
ent assay, and the remainder were validated clinically. This represents 
an average of 1.12 de novo SNVs and 0.09 de novo indels in coding or 
splicing regions per child, which is within the range of similar stud- 
ies’"'. The distribution of de novo SNVs and indels per child closely 
approximated the Poisson distribution expected for random muta- 
tional events (Extended Data Fig. 2). 

We classified 28% (n = 317) of children with probable pathogenic 
variants (Supplementary Table 4 and ref. 13) in 1,129 robustly impli- 
cated developmental disorder genes (published before November 2013), 
or with pathogenic deletions or duplications. Most of these diagnoses 
involved de novo SNVs, indels or CNVs (Table 1). Females had a sig- 
nificantly higher diagnostic yield of autosomal de novo mutations than 
males (P = 0.01, Fisher’s exact test). Among the single-gene diagnoses, 
most genes linked to developmental disorders (95 out of 148) were only 
observed once, although eight (ARID1B, SATB2, SYNGAP1, ANKRD11, 
SCN1A, DYRKIA, STXBP1, MED13L) each accounted for 0.5-1% of 
children in our cohort (Extended Data Fig. 3). For seventeen of these 
children we identified two different genes with pathogenic variants, 
resulting in a composite clinical phenotype. 

Analyses that assess the enrichment in patients of a particular class 
of variation, so-called “burden analyses’, both highlight classes of var- 
iants for detailed analysis and enable estimation of the proportion of a 
particular class of variant that is likely to be pathogenic. We observed 
a significant (P = 0.0004) burden of 87 de novo CNVs in the 1,133 
children with developmental disorders compared to 12 in 416 controls 
(Scottish Family Health Study") despite most children (77%) having 
previously had clinical microarray testing (Extended Data Fig. 4). 

We used gene-specific mutation rates that account for gene length 
and sequence context’ to assess the burden of different classes of de novo 
SNVs and indels (Supplementary Information). We observed no sig- 
nificant excess of any functional class of de novo SNVs or indels in 


Table 1 | Breakdown of diagnoses by mode and by sex 


Female (%) Male (%) Total (%) 
Undiagnosed 383 (69.6) 433 (74.3) 816 (72.0) 
Diagnosed 167 (30.4) 150 (25.7) 317 (28.0) 
De novo mutation 124 (22.5) 80 (13.7) 204 (18.0) 
Chr X* 24 (4.4) 5 (0.9) 28 (2.6) 
Autosomal* 100 (18.2) 75 (12.9) 176 (15.5) 
Autosomal dominantt 9 (1.6) 11 (1.9) 20 (1.8) 
Autosomal recessive 20 (3.6) 26 (4.5) 46 (4.1) 
X-linked inherited 1 (0.2) 19 (3.3) 20 (1.8) 
UPD/mosaicism 4 (0.7) 6 (1.0) 10 (0.9) 
Composite 9 (1.6) 8 (1.4) 17 (1.5) 
Total 550 583 1,133 


UPD, uniparental disomy. 
* Chromosome X (Chr X) and autosomal values are subsets of ‘De novo mutation’. 
+ Inherited from a parent. 


*Lists of participants and their affiliations appear at the end of the paper. 
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autosomal-recessive developmental-disorder-linked genes (Extended 
Data Fig. 5), suggesting that few of these mutations are causally impli- 
cated. By contrast, we observed a highly significant excess of all “func- 
tional’ classes (coding and splice site variants excepting synonymous 
changes) of de novo SNVs and indels in the dominant and X-linked 
developmental-disorder-linked genes (Extended Data Fig. 5) within 
which de novo mutations can be sufficient to cause disease. Not all 
protein-altering mutations in known dominant and X-linked devel- 
opmental disorder genes will be pathogenic, and these burden analyses 
inform estimates of positive predictive values for different classes of 
mutations. The remaining genes (that is, those not linked to develop- 
mental disorder) in the genome also exhibit a more modest, but sig- 
nificant, excess of functional, but not silent, de novo SNVs and indels 
(Extended Data Fig. 5). 

We observed 96 genes with recurrent, functional mutations (Fig. 1a), 
a highly significant excess compared to the expected number derived 
from simulations (median = 55; Supplementary Information). This 
enrichment is even more pronounced (observed, 29; expected, 3) for 
recurrent loss-of-function mutations (Fig. 1b). Among undiagnosed 
children, we observed an excess of 22 genes (observed: 45, expected: 23) 
with recurrent functional mutations (Fig. 1a) and an excess of 8 genes 
(observed, 9; expected, 1) with recurrent loss-of-function mutations 
(Fig. 1b), implying that an appreciable fraction of these recurrently 
mutated genes are novel developmental-disorder-linked genes. 

To identify individual genes enriched for damaging de novo muta- 
tions (Supplementary Information), we tested for a gene-specific over- 
abundance of either de novo loss-of-function mutations or clustered 
functional de novo mutations in 1,130 children (excluding one twin 
from each of three identical twin pairs). To increase power to detect 
genes associated with developmental disorder, we also meta-analysed 
our data with published de novo mutations from 2,347 developmental 
disorder trios with intellectual disability*’, epileptic encephalopathy’, 
autism® *"°, schizophrenia’, or congenital heart defects’ (the ‘meta-DD” 
data set). These analyses (Fig. 2) successfully re-discovered 20 known 
genes linked to developmental disorder at genome-wide significance 
(P<1.31X 10 °,a Bonferroni P value of 0.05 corrected for 38,504 tests 
(Supplementary Information)). Thus, despite the broad phenotypic 
ascertainment in these data sets, we can robustly detect developmental- 
disorder-linked genes solely on statistical grounds. 

To increase our power to detect novel genes linked to developmental 
disorder, we repeated the gene-specific analysis described above exclud- 
ing the 317 individuals with a known cause of their developmental 
disorder. In this analysis the statistical genetic evidence was integrated 
with phenotypic similarity of patients, available data on model organ- 
isms and functional plausibility. We identified twelve novel disease genes 
with compelling evidence for pathogenicity (Table 2), nine of which 
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Figure 1 | Excess of recurrently mutated genes. Each panel shows the 
observed number of recurrently mutated genes (diamond) and the distribution 
of the number of recurrently mutated genes in 10,000 simulations (box 
indicates interquartile range, whiskers indicates 95% confidence interval) 
under a model of no gene-specific enrichment of mutations. a, All protein- 
altering mutations in all DDD children and undiagnosed DDD children. 

b, All loss-of-function mutations in all DDD children and undiagnosed DDD 
children. Each diamond is annotated with the median excess of recurrently 
mutated genes, with 95% confidence intervals in brackets. P value of observed 
excess is <0.0001 for all four tests. No statistical methods were used to 
predetermine sample size. 
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Figure 2 | Gene-specific significance of enrichment for de novo mutations. 
The -log)9(P) value of testing for mutation enrichment is plotted only for 
each gene with at least one mutation in DDD children. On the x axis is the 

P value of the most significant test in the DDD data set; on the y axis is the 
minimal P value from the significance testing in the meta-analysis data set. 
Red indicates genes already known to be associated with developmental 
disorders (in DDG2P). Only genes with a P value of less than 0.05/18,272 
(red lines) (where 18,272 is the number of genes tested) are labelled. 


exceeded the genome-wide significance threshold of 1.36 X 10° ° (Sup- 
plementary Information), with the remaining three genes (PCGF2, 
DNM1 and TRIO) just below this significance threshold. The two chil- 
dren with identical Pro65Leu mutations in PCGF2, which encodes a 
component of a Polycomb transcriptional repressor complex, share a 
strikingly similar facial appearance representing a novel and distinct 
dysmorphic syndrome. DNM1 was previously identified as a candidate 
gene for epileptic encephalopathy’. Two of the three children that we 
identified with DNM1 mutations also had seizures, and a heterozygous 
mouse mutant manifests seizures’®. In addition to two de novo mis- 
sense SNVs in TRIO, we identified an intragenic de novo 82-kilobase 
(kb) deletion of 16 exons. For several of these novel developmental- 
disorder-linked genes, the meta-DD analysis increased the significance 
of enrichment. For example, a total of five de novo loss-of-function 
variants in POGZ were identified, two from our cohort, two from 
recent autism studies®’ and one from a recent schizophrenia study’. 
Wealso identified six genes with suggestive statistical evidence of being 
novel genes associated with developmental disorder, defined as having 
a P-value for mutation enrichment less than 1 X 10°“ and being plaus- 
ible from a functional perspective (Extended Data Table 1). We anti- 
cipate that most of these genes will eventually accrue sufficient evidence 
to meet the stringent criteria we defined above for declaring a novel 
developmental-disorder-linked gene. 

Notably, we observed identical missense mutations in unrelated, 
phenotypically similar patients for four of these novel developmental- 
disorder-linked genes (PCGF2, COL4A3BP, PPP2R1A and PPP2R5D), 
and for a fifth gene, BCL11A, we identified highly significant clustering 
of non-identical missense mutations (Fig. 3). We hypothesize that the 
mutations in some of these genes may be operating by either dominant- 
negative or activating mechanisms. This hypothesis is supported by 
previous functional evidence for several of the mutated amino acids. 
The three identical Ser132Leu mutations in COL4A3BP, which encodes 
an intracellular transporter of ceramide, remove a serine that when 
phosphorylated downregulates transporter activity from the ER to the 
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Table 2 | Novel genes with compelling evidence for a role in developmental disorder 


Evidence Gene De novo DDD De novo meta P value Tes Mutation Predicted 
(missense, LOF) (missense, LOF) clustering haploinsufficiency (%) 

De novo enrichment COL4A3BP 3 (3,0) 5 (5,0) 4.10 x 10°12 Meta Yes 14.7 

PPP2R5D 4 (4,0) 5 (5,0) 6.01 x 10712 DDD Yes 19.7 

ADNP 40,4) 5 (0,5) 4.59 x 1071 Meta o 98 

POGZ 2 (0,2) 5 (0,5) 431 x 107° Meta No 30.0 

PPP2RI1A 3 (3,0) 3 (3,0) 2.03 x 1078 DDD Yes 23.5 

DDX3X 4 (3,1) 5 (3,2) 2.26 X 1077 DDD No 12.7 

CHAMP1 2 (0,2) 30,3) 4.58 x 1077 Meta No 52.9 

BCL11A 3 (3,0) 4 (3,1) 03 x 107° DDD ‘Yes 0.6 

PURA 3 (1,2) 3(1,2) 14 x 107° DDD No 9.4 
De novo enrichment plus additional DNM1 3 (3,0) 5 (5,0) 43 x10°° Meta (e) 13.5 
evidence TRIO 2 (2,0) 7 (7,0) 5.16 x 10° Meta Yes 25.7 

PCGF2 2 (2,0) 2 (2,0) 08 x 10-8 DDD Yes 37.7 
The table summarizes the 12 genes with compelling evidence to be novel developmental-disorder-linked genes. The number of unrelated patients with independent functional or loss-of-function (LOF) mutations 
in the Deciphering Developmental Disorders (DDD) cohort or the wider meta-analysis (meta) data set including DDD patients is listed. The P value reported is the minimum P value from the testing of the DDD data 
set and the meta-analysis data set. The data set that gave this minimal P value is also reported. Mutations are considered to be clustered if the P value of clustering of functional SNVs is less than 0.01. Predicted 


haploinsufficiency is reported as a percentile of all genes in the genome, with ~O% being highly likely to be haploinsufficient and 100% very unlikely to be haploinsufficient, based on the prediction score described 
in ref. 26 updated to enable predictions for a higher fraction of genes in the genome. During submission, a paper was published describing a novel developmental disorder caused by mutations in ADNP (ref. 27). 


Golgi”, presumably resulting in intra-cellular imbalances in ceramide 
and its downstream metabolic pathways. The two mutated amino acids 
(Arg182Trp and Prol79Leu) in PPP2R1A, which encodes the scaffold- 
ing A subunit of the protein phosphatase 2 complex, have been prev- 
iously identified as sites of driver mutations in endometrial and ovarian 
cancer’®. It has previously been shown that mutating either of these two 
residues results in impaired binding of B subunits of the complex"®. 
Intriguingly, PPP2R5D encodes one of the possible B subunits of the 
same protein phosphatase 2 complex, suggesting that the clustered 
missense mutations (Pro201 Arg and Glu198Lys) in this gene may simi- 
larly perturb interactions between subunits of this complex. Further 
functional studies will be required to confirm this hypothesis. 

We assessed transmission biases of potentially pathogenic inherited 
SNVs in our probands (Supplementary Information) and observed a 
genome-wide trend (P = 0.015) towards over-transmission to probands 
of very rare (minor allele frequency (MAF) <0.0005%) loss-of-function 
variants, but not damaging missense variants. We also observed a 1.8- 
fold enrichment (P = 0.04) of rare (MAF <5%) biallelic loss-of-function 
variants (Supplementary Table 5) among probands without a likely dom- 
inant cause of their disorder, compared to those with either a diagnostic 
de novo mutation or an affected parent. Again we saw no enrichment in 
biallelic damaging missense variants (Extended Data Table 2), consist- 
ent with a similar observation in children with autism’’. These obser- 
vations suggest that although inherited loss-of-function variants (both 
monoallelic and biallelic) are probably contributing to developmental 
disorder in our patients, much larger sample sizes will be required to 
pinpoint specific developmental-disorder-linked genes in this way. 

To direct future, detailed functional experiments on the develop- 
mental role of a subset of candidate genes from this study we used two 
approaches. First, knockdown-induced phenotypes were recorded in 
early zebrafish development. Second, we performed a systematic review 


of perturbed gene function in human, mouse, Xenopus, zebrafish and 
Drosophila. In both approaches the animal phenotypes were compared 
to those seen in individuals in our cohort. 

We undertook an antisense-based loss-of-function screen in zebra- 
fish to assess 32 candidate developmental-disorder-linked genes with 
de novo loss-of-function, de novo missense or biallelic loss-of-function 
variants from exome sequencing (Supplementary Information and Sup- 
plementary Table 6). These candidate genes corresponded to 39 zebra- 
fish orthologues. Knockdowns of these zebrafish genes were repeated 
at least twice and all morpholinos were co-injected with tp53 morpho- 
lino to eliminate off-target toxicity. Successful knockdown of the targeted 
messenger RNA could be confirmed using polymerase chain reaction 
with reverse transcription (RT-PCR) for 82.4% of genes (28 out of 34), 
and 9 out of 11 (82%) of genes that were tested gave an equivalent pheno- 
type when knocked down by a second, independent morpholino. Knock- 
down of at least one or a pair of zebrafish orthologues of 65.6% of 
candidate developmental-disorder-linked genes (21 out of 32) resulted 
in perturbed embryonic and larval development (Fig. 4, Extended Data 
Table 3, Supplementary Data and Supplementary Table 7). Large-scale 
mutagenesis” and morpholino” studies suggest that knockout or knock- 
down of 6-12% of genes give developmental phenotypes, suggesting at 
least a fivefold enrichment of developmentally non-redundant genes 
among the 32 selected for modelling. We then compared the phenotypes 
of the zebrafish morphants to those of the patients with de novo muta- 
tions or biallelic loss-of-function variants in the orthologous genes 
(Extended Data Table 3). Eleven out of twenty-one (52.4%) of the genes 
were categorized as strong candidates based on phenotypic similarity 
(Fig. 4a). Seven out of eleven were potential microcephaly genes, the 
knockdown of which in zebrafish gives significant reductions in both 
head measurements and neural tissue (Fig. 4b and Supplementary Infor- 
mation). Six out of twenty-one (28.6%) genes resulted in severe morphant 
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Figure 3 | Four novel genes with clustered mutations. The domains (blue), 
post-translational modifications, and mutation locations (red stars) are shown 
for four proteins with highly clustered de novo mutations in unrelated children 
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with severe, undiagnosed developmental disorders. For the protein PCGF2, 
where all observed mutations are identical, photos are shown to highlight the 
facial similarities of patients carrying the same mutation. 
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Figure 4 | Candidate gene loss-of-function modelling in zebrafish reveals 
enrichment for developmentally important proteins. a, Examples of 
developmental phenotypes: knockdown of pkn2a results in reduced 
cartilaginous jaw structures (black arrows); knockdown of fryl results in cardiac 
and craniofacial defects (white arrowheads and arrows, respectively); while 
knockdown of psmd3 results in smaller ear primordia (red arrows), and 
mis-patterned CNS neurons (compare red double arrows and brackets). MO, 
morpholino. b, Knockdown outcomes of seven genes with variants present in 
microcephaly patients: interocular measurements of bright-field images from 


phenotypes which could not be meaningfully linked to patient pheno- 
types. As many of our candidate developmental disorder genes carried 
heterozygous loss-of-function variants (de novo mutations), it is to be 
expected that the severity of loss-of-function phenotypes in zebrafish 
may exceed that observed in our patient cohort. The genes with proven 
non-redundant developmental roles can reasonably be assigned higher 
priority for downstream functional investigations and genetic analyses. 

Our systematic review of gene perturbation in multiple species sought 
both confirmatory and contradictory (for example, healthy homozygous 
knockout) evidence from other animal models for these 21 apparently 
developmentally important genes. We identified 16 genes with solely 
confirmatory data, often from multiple different organisms, none with 
solely contradictory data, two with both confirmatory and contradict- 
ory evidence, and three with no evidence either way (Supplementary 
Table 8). 

In summary, our analyses validate a large-scale, genotype-driven 
strategy for novel developmental-disorder-linked gene discovery that 
is complementary to the traditional phenotype-driven strategy of study- 
ing patients with very similar presentations, and is particularly effective 
for discovering novel developmental disorders with highly variable or 
indistinct clinical presentations. Our meta-analysis with previously pub- 
lished developmental disorder studies increased power to detect novel 
developmental-disorder-linked genes and highlights the shared gen- 
etic aetiologies between diverse neurodevelopmental disorders such as 
intellectual disability, epilepsy, autism and schizophrenia”. We iden- 
tified significantly more pathogenic autosomal de novo mutations in 
females compared to males. An increased burden of monogenic disease 
among females with neurodevelopmental disorders has become more 
apparent’, and our observations strengthen this proposition. Further 
investigations are required to assess whether males might be enriched 
for poly/oligogenic causation. 

The 35 patients with pathogenic mutations in the 12 novel develop- 
mental-disorder-linked genes we discovered increased our diagnostic 
yield from 28% to 31%. This raises the question of what are the causes 
of the developmental disorders in the other 69% of patients. The undia- 
gnosed patients are not obviously less severely affected than the diagnosed 
patients (for example, fewer phenotype terms, older age of recruitment). 
Weanticipate that there are many more pathogenic, monogenic, coding 
mutations in these undiagnosed patients that we have detected, but 
for which compelling evidence is currently lacking. This hypothesis is 
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control and loss-of-function embryos reveal significant decreases in head size. 
A neuronal antibody stain (anti-HuC/D, green channel) labels the brains of 
control and morphant zebrafish. Measurements taken across the widest extent 
of the midbrain identify significant reductions in brain size, probably 
underlying the concomitant head-size reductions seen in bright-field images. 
In b, tables show average percentage reduction in head and brain width, 

and P values of a t-test. Original magnifications: a, 5 (pkn2a), 10X (fryl 
and psmd3, bright-field) and 40x (psmd3, green channel); b, 10 (brightfield) 
and 20x (green channel). 


supported by four strands of evidence: (1) modelling statistical power 
suggests that studying ~1,000 trios has only 5-10% power to detect 
an averagely mutable haploinsufficient developmental-disorder-linked 
gene (Extended Data Fig. 6a and Supplementary Information); (2) the 
expectation that our power to detect novel developmental-disorder-linked 
genes that operate recessively or by gain-of-function mechanisms will 
be lower than for haplosufficient genes; (3) the significant enrichment 
in undiagnosed patients of functional mutations in genes predicted to 
exhibit haploinsufficiency (Extended Data Fig. 6b); and (4) the strong 
enrichment for developmental phenotypes in the zebrafish knock- 
down screen. 

Given our limited power to detect pathogenic mutations that act 
through dominant-negative or activating mechanisms, it was notable 
that in four of our novel genes (COL4A3BP, PPP2R1A, PPP2R5D and 
PCGE2) we observed identical de novo mutations in unrelated trios. 
Two hypotheses might explain this observation. First, that there is a 
vast number of different gain-of-function mutations, of which we are 
just scratching the surface in this study, or second, that these particular 
variants are enriched in our cohort owing to these mutations confer- 
ring a positive selective advantage in the germ line”. Analysis of larger 
data sets will be required to assess these hypotheses, although they are 
not necessarily mutually exclusive. 

These considerations of the limited power of even nationwide stud- 
ies such as ours motivate the international sharing of minimal geno- 
typic and phenotypic data, for example through the DECIPHER web 
portal (http://decipher.sanger.ac.uk), to provide diagnoses for patients 
who would otherwise remain undiagnosed. Plausibly pathogenic var- 
iants observed in undiagnosed patients in our study (de novo SNVs, 
indels and CNVs, and biallelic loss of function in genes not yet associated 
with disease) are shared through DECIPHER, and we encourage other, 
comparable studies to adopt a similar approach. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Extended Data Figure 1 | Characteristics of the families. a, Gestation- 
adjusted decimal age (years) at last clinical assessment. The histogram shows 
the distribution of the gestation-adjusted decimal age at last clinical assessment 
across the 1,133 probands. The dashed red line shows the median age. 

b, Frequency of human phenotype ontology (HPO) term usage. Bar plot 
showing, for each used HPO term, the number of times it was observed across 
the 1,133 proband patient records. ¢, Projection PCA plot of the 1,133 
probands. PCA plot of 1,133 DDD probands projected onto a PCA analysis 
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using four different HapMap populations from the 1000 genomes project. 
Black, African; red, European; green, east Asian; blue, south Asian; and the 
1,133 DDD probands are represented by orange triangles. d, Self-declared and 
genetically defined consanguinity. Overlaid histogram showing the distribution 
of kinship coefficients from KING comparing parental samples for each trio. 
Green, trios where consanguinity was not entered in the patient record on 
DECIPHER; red, trios consanguinity was declared in the patient record on 
DECIPHER. 
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Extended Data Figure 2 | Number of validated de novo SNVs and indels per proband. Bar plot showing the distribution of the observed number of validated 
SNVs and indels per proband sample, and the expected distribution assuming a Poisson distribution with the same mean as the observed distribution. 
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Extended Data Figure 3 | Number of diagnoses per gene. Histogram showing the number of diagnoses per gene for genes with at least two diagnoses from 


different proband samples. 
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Extended Data Figure 4 | Burden of large CNVs in 1,133 DDD proband had previous microarray based genetic testing; purple, DDD probands who 
samples. Plot comparing the frequency of rare CNVs in three sample groups _ have had negative previous microarray-based genetic testing; green, DDD 
against CNV size. The y axis is on a log scale. Red, DDD probands who have not controls. 
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Extended Data Figure 5 | Expected and observed numbers of de novo along with the P value from an assessment of a statistical excess of observed 
mutations. The expected and observed numbers of mutations of different mutations. The three classes of genes are described in the main text. 


functional consequences in three mutually exclusive sets of genes are shown, 
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Silent Diagnostic 


Extended Data Figure 6 | Haploinsufficiency analyses. a, Saturation analysis 
for detecting haploinsufficient developmental-disorder-linked genes. A box 
plot showing the distribution of statistical power to detect a significant 
enrichment of loss-of-function mutations across 18,272 genes in the genome, 
for different numbers of trios studied, from 1,000 trios to 12,000 trios. Line 
within the box shows the median, box shows the interquartile range and the 
whiskers show the most extreme values within 1.5 times the interquartile range 
from the box. b, Distribution of haploinsufficiency scores in selected sets of de 
novo mutations. Violin plot of haploinsufficiency scores in five sets of de novo 
mutations. Silent, all synonymous mutations; diagnostic, mutations in known 
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developmental-disorder-linked genes in diagnosed individuals; 
undiagnosed_Func, all functional mutations in undiagnosed individuals; 
undiagnosed_LofF, all loss-of-function mutations in undiagnosed individuals; 
undiagnosed_recur, mutations in genes with recurrent functional mutations in 
undiagnosed individuals. P values for a Mann-Whitney U-test comparing each 
of the latter four distributions to that observed for the silent (synonymous) 
variants are plotted at the top of each violin. Dot indicates the median, box is 
interquartile range and whiskers are the most extreme values within 1.5 times 
the interquartile range from the box. 
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Extended Data Table 1 | Novel genes with suggestive evidence for a role in developmental disorder 


LETTER 


culdinnen cea de novos DDD de novos Meta P Value Test Mutation Predicted 
(Missense, LoF) (Missense, LoF) Clustering Haploinsufficiency 
De novo enrichment+ NAA15 1 (0,1) 3 (0,3) 1.64E-06 Meta No 7.5% 
additional evidence ZBTB20 3 (1,2) 3 (1,2) 4.84E-06 DDD No 0.2% 
NAA10 2 (2,0) 3 (3,0) 8.28E-06 Meta No 34.1% 
TRIP12 3 (1,2) 4 (2,2) 2.13E-05 Meta No 3.8% 
USP9X 3 (1,2) 3 (1,2) 5.14E-05 DDD No 3.8% 
KAT6A 2 (0,2) 2 (0,2) 7.91E-05 DDD No 19.0% 


Six genes with suggestive evidence to be novel developmental-disorder-linked genes. The number of unrelated patients with independent functional or loss-of-function mutations in the DDD cohort or the wider 
meta-analysis data set including DDD patients is listed. The P value reported is the minimum P value from the testing of the DDD data set and the meta-analysis data set. The data set that gave this minimal P value is 
also reported. Mutations are considered to be clustered if the P value of clustering of functional SNVs is less than 0.01. Predicted haploinsufficiency is reported as a percentile of all genes in the genome, with ~O% 
being highly likely to be haploinsufficient and 100% very unlikely to be haploinsufficient, based on the prediction score described in ref. 26 updated to enable predictions for a higher fraction of genes in the 
genome. NAA1O is already known to cause an X-linked recessive developmental disorder in males, but here we identified missense mutations in females, suggesting a different, X-linked dominant, disorder. 
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Extended Data Table 2 | Biallelic loss of function and damaging functional variants 


Biallelic Variant Types Untransmitted Likely Dominant Other Probands 
Diplotypes Probands (n=810) 
(n=1080) _(n=270) 

LoF/LoF (Genome-wide) 110 17 86 

LoF/Dam (Genome-wide) 87 21 71 

Dam/Dam (Genome-wide) 312 90 264 

LoF/LoF (DDG2P Biallelic) 1 1 3 

LoF/Dam (DDG2P Biallelic) 2 0 6 

Dam/Dam (DDG2P Biallelic) 26 7 25 


Rare (MAF <5%) biallelic loss of function and damaging functional variants in uninherited diplotypes and probands. ‘Likely dominant probands’ refers to probands with a reported de novo mutation or affected 
parents, and ‘other probands’ refers to all remaining probands. ‘DDG2P biallelic’ refers to confirmed and probable DDG2P genes with a biallelic mode of inheritance. See Supplementary Methods for details of 
variant processing. 
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Extended Data Table 3 | Zebrafish modelling identifies 21 developmentally important candidate genes 


Gene # patients Variant Patient phenotypes hice Sad Relevant knockdown phenotypes 
BTBD9 2/1 Biallelic LoF/De Seizures, microcephaly, hypertonia Strong Reduced head size, brain volume 
novo Missense 
CHD3 1/2 De novo CNS and craniofacial defects Strong Abnormal head shape 
LoF/Missense 
DDX3X 1/3 De novo Moderately short stature, microcephaly, CNS defects Strong Reduced head size, brain volume 
LoF/Missense 
ETF1 1 De novo LoF CNS and craniofacial defects, seizures, microcephaly, Strong Reduced head size, brain volume 
hypertelorism 
FRYL 1 De novo LoF Short stature, craniofacial and cardiac defects Strong Cardiac defects, reduced axis length 
PKN2 4 De novo Missense CNS, cardiac, ear, and craniofacial defects, growth Strong Cardiac, craniofacial cartilage, and growth defects 
retardation 
PSMD3 © De novo Missense Microcephaly, muscular hypotonia, seizures, growth Strong Reduced head size and neural defects 
abnormality 
SCGN 1 Biallelic LoF Seizures, microcephaly, CNS defects Strong Reduced head size, brain volume 
SETDS a De novo LoF Seizures, CNS and cardiac defects, poor motor coordination Strong Reduced head size, cardiac defects, abnormal 
locomotion 
THNSL2 2 Biallelic LoF Microcephaly, CNS and ear defects Strong Reduced head size, brain volume, neural defects 
ZRANB1 2 De novo Missense Microcephaly, muscle defects, seizures Strong Reduced head size and neural defects 
DPEP2 a Biallelic LoF CNS defects, growth retardation Moderate Growth reduction 
PSD2 1 De novo LoF CNS defects, hypertonia, seizures Moderate Abnormal musculature, CNS and locomotion 
SAP130 1 De novo LoF Short stature, hypotonia, hypotelorism Moderate Abnormal locomotion 
CNOT1 1/1 De novo Short stature, cardiac, CNS, ear and craniofacial defects Weak Multisystem 
LoF/Missense 
DTWD2 t De novo LoF CNS defects, seizures Weak Multisystem 
ILVBL 1 De novo LoF CNS and craniofacial defects Weak Multisystem 
NONO 1 De novo LoF CNS and ear defects, hypotonia, growth retardation Weak Multisystem, with otic and growth defects 
POGZ 2 De novo LoF CNS and ear defects, hypotonia, seizures, coloboma Weak Multisystem 
SMARCD1 1/1 De novo CNS defects, hypotonia Weak Multisystem 
LoF/Missense 
WWC1 i De novo Missense CNS defects, hypertelorism None None 


This table summarizes the 21 genes for which knockdown results in developmental phenotypes in zebrafish. The ‘# patients’ column indicates how many patients were identified as carrying variants in these 
genes. Split numbers indicate the breakdown of variant types (for example, for BTBD9, 2/1 is two biallelic loss of function and one de novo missense carrying patients). Asummary of the patient phenotypes is listed, 
as well as the relevant phenotypes observed in zebrafish knockdown experiments. Phenotypic concordance categories indicate the degree of overlap between the zebrafish phenotyping and the patient 
phenotypes. Weak concordance typically is the result of severe, multisystem phenotypes in zebrafish. See Supplementary Information for more detailed phenotype information. 
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Orientation columns in the mouse superior colliculus 


Evan H. Feinberg’ & Markus Meister!” 


More than twenty types of retinal ganglion cells conduct visual in- 
formation from the eye to the rest of the brain’”. Each retinal gang- 
lion cell type tessellates the retina in a regular mosaic, so that every 
point in visual space is processed for visual primitives such as con- 
trast and motion’. This information flows to two principal brain 
centres: the visual cortex and the superior colliculus. The superior 
colliculus plays an evolutionarily conserved role in visual behaviours’, 
but its functional architecture is poorly understood. Here we report 
on population recordings of visual responses from neurons in the 
mouse superior colliculus. Many neurons respond preferentially to 
lines of a certain orientation or movement axis. We show that cells 
with similar orientation preferences form large patches that span 
the vertical thickness of the retinorecipient layers. This organization 
is strikingly different from the randomly interspersed orientation 
preferences in the mouse’s visual cortex’; instead, it resembles the 
orientation columns observed in the visual cortices of large mam- 
mals**. Notably, adjacent superior colliculus orientation columns 
have only limited receptive field overlap. This is in contrast to the 
organization of visual cortex, where each point in the visual field acti- 
vates neurons with all preferred orientations’. Instead, the superior 
colliculus favours specific contour orientations within ~30° 
regions of the visual field, a finding with implications for beha- 
vioural responses mediated by this brain centre. 

We exposed the mouse superior colliculus (SC) for chronic brain 
imaging while leaving cortex intact (see Methods; Fig. la-c and Ex- 
tended Data Fig. 1) and delivered the calcium indicator GCaMP6s as a 
neural activity reporter’®. Awake mice head-fixed on a circular tread- 
mill viewed stimuli on a tangent screen, while neuronal responses were 


a b 


Blank 


Blank 


500% AF/F 


monitored by two-photon microscopy (Fig. 1d). The focal plane of the 
microscope was roughly parallel to the surface of the SC and its reti- 
notopic map of visual space’””. The screen displayed thin bars drifting 
along their short axes (Fig. 1d), a stimulus that elicits orientation- or 
axis-tuned responses from many cells in the SC of anaesthetized mice, 
albeit by mechanisms that appear distinct from those of cortical neu- 
rons'*"*, The animals remained stationary on most stimulus trials (72%, 
n = 5 animals, 208 stimulus blocks), and the fraction of stationary trials 
did not vary across visual stimuli (P = 0.99, Kruskal-Wallis test). Con- 
sequently, we report measurements from all trials regardless of loco- 
motion; these results differ only subtly from those obtained by excluding 
running trials. 

Neurons in the upper layers of the SC responded to drifting bars with 
large and reproducible transients in fluorescence that were often stron- 
ger to certain bar orientations than to others, consistent with previous 
reports of orientation tuning"* (Fig. le, f). Unexpectedly, neighbouring 
neurons frequently displayed remarkably similar response profiles 
(Fig. 2a, b). Volumes of the SC (150 [um (anterior-posterior) < 280 um 
(medial-lateral) < 40-80 j1m (dorsal-ventral)) were analysed in sev- 
eral animals, with a focus on fields of view containing multiple groups 
of cells with different preferred orientations (Fig. 2c). The preferred 
orientations of orientation-selective cells separated horizontally by short 
distances (<100 |tm) were much more alike than expected by chance 
(Fig. 2d and Extended Data Fig. 4a-c; mean A® + s.e.m., 28.6° + 0.7; 
median AQ, 20.5° for 1,139 cell pairs), whereas preferred orientations 
of cells separated by greater distances (150-250 lum) were much less 
alike than expected by chance (Fig. 2d and Extended Data Fig. 4a-c; 
mean AO = s.e.m., 57.8° + 1.2; median AO, 63.6° for 440 cell pairs). 


Figure 1 | Calcium imaging in awake mouse 
superior colliculus reveals orientation tuning. 

a, Schematic of mouse cerebral anatomy. The 

SC lies beneath visual cortex (V1) and the 
transverse sinus (TS). GCaMP6s-expressing SC is 
labelled in green. b, Schematic of cerebral anatomy 
after insertion of a triangular plug to reveal 
~15-25% of the SC. c, Exposed portion of the SC. 
d, Schematic of experimental setup. Mice are head- 
fixed on a turntable and free to run. Visual 
stimuli are presented on a tangent screen while 
two-photon calcium imaging is used to record 
neural population activity. PMT, photomultiplier 
tube. e, f, Average fluorescence signal AF/F + s.d. 
of two SC neurons to 7 repetitions each of 8 
directions of bar motion or a blank screen. 

Insets are polar plots of the peak responses. 
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Figure 2 | Patches of neurons with similar orientation tuning in the 
superior colliculus. a, b, Two fields of cells in the SC (upper left panels) 

and corresponding elliptical regions of interest (lower left panels). Responses of 
the cells to drifting bars are overlaid in polar plots (right panels), normalized 
to peak responses. Scale bar, 20 um. ¢, All orientation-tuned cells within a 
volume in the SC are plotted as spheres and colour-coded according to 
preferred orientations. L, lateral; P, posterior; S, superficial. d, e, Absolute value 
of the difference in preferred orientations plotted against horizontal distance 
in the SC (d) for 7 volumes in 6 animals (n = 269 cells, 2,077 pairs) and in 
V1 (e) for 3 volumes in 3 animals (n = 104 cells, 890 pairs). 0 and 90° 
correspond to identical and orthogonal orientation preferences, respectively, 
whereas 45° would be expected by chance (dashed grey line). Orange lines, 
means for 25-l1m bins + s.e.m. 


These results were unexpected because the input to the SC from the 
retina is not thought to carry an orientation bias, even though individual 
retinal ganglion cells can be orientation-tuned’’. We first considered the 
effects of optical projection from the flat tangent screen, which can alter 
the apparent width of a bar depending on its orientation, but the pre- 
ferred orientations were not consistently biased towards or away from 
radial orientations (Extended Data Figs 1 and 2 and Supplementary 
Discussion). To confirm that the effects observed were not artefacts of 
our experimental system, we modified the surgical procedure to deliver 
GCaMP6s to both the SC and primary visual cortex (V1) and visualize 
both areas. Drifting bar stimuli elicited orientation-tuned responses from 
neurons in V1 (Extended Data Fig. 3) that were often sharper than in the 
SC'*"* (mean orientation selectivity index (see Methods) of orientation- 
tuned neurons: 0.39 (V1) vs 0.31 (SC), P = 0.003), suggesting that sur- 
gical exposure of the SC spares V1 function. However, unlike in the SC, 
the arrangement of V1 neurons bore no relationship to their preferred 
orientation and was indistinguishable from chance’. (Fig. 2e and Ex- 
tended Data Fig. 4d—f). Moreover, SC and V1 neurons had overlapping 
receptive fields, suggesting that the pattern observed in the SC is likely 
not inherited from the retina. This side-by-side comparison indicates 
that the orientation patches in the SC are not artefacts of the stimulus 
or imaging paradigms. Instead, the organization of orientation select- 
ivity in the mouse SC differs substantially from that in visual cortex. 

To probe the functional organization of the SC perpendicular to the 
brain surface we imaged neural responses at different depths. Pairs of 
neurons separated by <25 um horizontally tended to share preferred 
orientations, regardless of depth separation, at least over 80 [um depth 
(Fig. 3a, b). In the deeper SC, fluorescence signals often became dimmer 
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and less sharp, precluding efficient motion correction and accurate cor- 
rection for neuropil contamination for single cells (Methods). Because 
SC cells and the surrounding neuropil are tuned alike (Extended Data 
Fig. 5), bulk fluorescence signals offer useful proxies for local orienta- 
tion tuning that allowed analysis of deeper volumes (Fig. 3c). Orien- 
tation tuning was similar between slices of the same vertical column 
over depth separations up to 260 jum, but significantly different for slices 
drawn from different columns (Fig. 3d; P < 0.001, Kruskal-Wallis test). 
These results suggest that vertical columns of cells with similar orien- 
tation preferences span the retinorecipient SC (Fig. 3c, d). 

To generate larger maps of orientation tuning in the SC, we turned to 
a complementary wide-field method: optical imaging of intrinsic sig- 
nals*. Because most cells preferred orientations close to the cardinal 
axes (Extended Data Fig. 6), horizontal and vertical bars were used for 
intrinsic imaging experiments. Large patches (>200 um diameter) of 
the SC preferred either horizontal or vertical bars (Fig. 4a). This arrange- 
ment was grossly reproducible across animals. The medial and lateral 
parts of the exposed SC, corresponding to the superior visual field and 
elevations close to the horizon, respectively, tended to prefer vertical 
bars—whereas the intervening area, which surveys intermediate ele- 
vations, tended to prefer horizontal bars (Fig. 4a-c). The orientation 
maps showed no relationship to the distortions introduced by the flat 
screen, and were robust to variations in the widths and velocities of the 
bars (Extended Data Fig. 7). These patches were reminiscent of the 
patterns observed in visual cortices of other mammalian species”®. At- 
tempts were made with limited success to aspirate the overlying cortex 
and blood vessels to expose more of the SC. In one instance, a small 
patch extending further anterior and lateral could be imaged, and the 
stereotyped orientation patches appeared in the expected locations. 
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Figure 3 | Orientation patches form vertical columns. a, A cylinder centred 
on each neuron was projected through the volume and the similarity of its 
preferred orientation to that of each cell in the cylinder was determined. 

b, Difference in preferred orientations plotted against vertical distance. Orange 
lines, means for 20-lm bins + s.e.m. Dashed grey line indicates chance. 
Differences across depths were not significant (P > 0.1, Kruskal-Wallis test; 
n= 5 animals, 397 pairs). c, Signals are averaged within 75 X 150 1m slices on 
the medial and lateral edges of an image plane, separated horizontally by 

130 um, at several depths along the vertical axis. d, Difference in preferred 
orientations plotted against vertical distance for orientation-tuned slices along 
the same vertical column (black dots) or adjacent vertical columns (grey dots). 
Lines indicate medians for 50-j1m bins. M, medial slice; L, lateral slice. Data 
from 4 mice. 
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Figure 5 | Inhomogeneous coverage of contour orientation in the superior 
colliculus. a, Stimulus used to map retinotopy. b-d, Activation stripes elicited 
by the grating slit presented at positions 1 and 2 from a. Darker areas are 
more strongly activated. The reflectance change AR/R from black to white is 
2X 10-7 in both panels. e, Orientation map (from Fig. 4c). V, vertical; H, 
horizontal. f, Innomogeneity indices for 3 mice from Fig. 4a-c for sampling 
windows 100-400 um wide. Each black line corresponds to one mouse; 

each grey line corresponds to data from a shuffled orientation map. Error bars, 
s.e.m. g, Flashed spot stimulus used to map receptive fields in two-photon 
experiments. h, Receptive field centres (dots) for orientation-tuned cells in a 
field of view. Ellipses indicate two-dimensional Gaussian receptive field fits 
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Figure 4 | Intrinsic imaging reveals larger orientation maps. a, Orientation 
map of the SC. Red areas respond more strongly to vertical bars and blue areas 
respond more strongly to horizontal bars. A, anterior; L, lateral; M, medial; 
P, posterior. The reflectance change AR/R from red to blue is 6 X 10° *inaand 
c,4X 10 -* in b and 3 X 10 “ in d. The dashed outline indicates rough 
dimensions of the SC and approximate location of field of view. Midline blood 
vessels obscure the medial SC, containing cells with the most superior receptive 
fields. b, c, Orientation maps for two additional animals. d, Orientation map 
in an animal in which visual cortex and the transverse sinus had been surgically 
ablated weeks earlier. Granulation tissue over much of the SC limited the 
field of view. 


This suggests that orientation columns in the SC arise even without 
input from the visual cortex (Fig. 4d). 

These results indicate that cells in the SC with similar orientation pre- 
ferences form patches within the retinotopic map of visual space, an 
arrangement that might preclude uniform coverage of different contour 
orientations throughout the visual field. This issue has been addressed 
in species with orientation columns in the visual cortex. There a uni- 
form coverage is preserved because the grain of the orientation patches 
is so fine that any point in the visual field activates neurons of all pos- 
sible orientation preferences’”"'*. To examine whether this applies in 
the mouse SC, we measured the projective fields on the SC for localized 
stimuli (Fig. 5a). Thin gratings produced activation stripes consistent 
with the known retinotopic map (Fig. 5b-d). Stripes had an average width 
of 170-190 um (full-width at half-maximum, 4 stripes, 2 animals). 
Gratings separated by ~ 12° in space excited largely non-overlapping 
areas on the SC, with peak-to-peak distances of 100-170 lum (n = 4 pairs, 
2 mice; Fig. 5b-d and Extended Data Figs 7 and 8). By comparison, the 


d e 


Prefers 1 
Prefers 2 


(radii 1 s.d.) for two representative cells. Image axes correspond to screen 
coordinates; dashed lines demarcate stimulus pixels. Red and blue squares 
indicate consensus receptive field centres of cells preferring vertical and 
horizontal bars, respectively. Arrows indicate projections of spherical 
coordinate axes from the animal’s perspective. Arrow lengths, 5 degrees of 
visual angle. S, superior; T, temporal. i, Schematic of consequences of SC 
orientation columns for mouse vision. Image from Fig. 4a. In the temporal 
visual field, there are patches at high and low elevation where the SC prefers 
vertical bars and at intermediate elevation where it prefers horizontal bars. 
Approximate elevation and azimuth marked in degrees. 
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orientation patches spanned 200 [1m to 400 [um in width (Fig. 5e; 3 ani- 
mals), corresponding to ~30° of visual angle. Indeed, patches of the 
SC up to 300 1m wide were typically dominated by a particular pre- 
ferred orientation (Fig. 5f). Thus the parcellation of orientation tuning 
is much coarser than the spatial projective field onto the SC. This implies 
that stimuli as large as 30° will be processed with some bias towards a 
certain contour orientation. 

This unexpected result was scrutinized further. Intrinsic reflectance 
represents a bulk signal from the tissue, and the stimuli used to map 
orientation tuning and projective fields might excite two different sets 
of neurons. We therefore returned to two-photon calcium imaging to 
examine individual neurons. For each orientation-tuned neuron in sev- 
eral volumes we mapped the visual receptive field with small flashed 
spots (Fig. 5g). The receptive fields were comparable in size (median 
diameter, 10°; median area, 74°) to off-type receptive fields reported 
in anaesthetized mice'*. Neurons were grouped by their preferred ori- 
entations into bins centred on 0, 45, 90 or 135°. The receptive field 
centres of cells preferring the same orientation were found clustered in 
visual space, separated from those preferring a different orientation 
(Fig. 5h). Moreover, there was little overlap between the receptive fields 
of cells preferring different orientations (Fig. 5h). Consistent with these 
observations, cells with a given orientation preference responded much 
more strongly to spots flashed in the consensus receptive field centre 
(Fig. 5h) of the cells sharing that preferred orientation (median 100%, 
mean 81%, fraction of peak response; 4 animals, 63 cells) than to the 
receptive field centre of cells with a different preferred orientation (me- 
dian 29%, mean 38%; P < 0.001, Kruskal-Wallis test). Thus, both in- 
trinsic imaging and two-photon calcium imaging reveal that adjacent 
orientation columns in the SC survey largely non-overlapping regions 
in space, in violation of position invariance and uniform coverage. These 
orientation patches cover large regions of the visual field (~30°) com- 
pared to the mouse’s acuity (~2°)°°. Furthermore, the absolute magni- 
tude of this inhomogeneity is substantial. The average orientation-tuned 
neuron (OSI = 0.30) has a 2:1 bias in favour of the preferred orienta- 
tion, so the relative gain for orthogonal orientations in adjacent patches 
may differ by a factor of 4. For regions of the visual field spanning 
several tens of degrees, the SC favours one orientation and responds 
less well to stimuli of other orientations (Fig. 5i). 

It is not apparent how columnar architecture arises in the SC. Perhaps 
afferents from distinct retinal ganglion cell subtypes are routed to dif- 
ferent orientation patches by different rules. Indeed, the terminal arbors 
of certain retinal ganglion cell types show a vertical columnar structure, 
albeit on a slightly finer scale****. Additional hints at non-uniform 
anatomy are the lattices formed deeper in the SC by cholinergic?” 
and nigro-collicular’* fibres, again on the scale of several hundred mi- 
crometres. The tuning field for contour orientation (Fig. 5i) may also 
relate to regional specializations in the SC. For example, stimulation of 
regions of the SC that survey the upper or lower visual field elicits 
avoidance or approach behaviours, respectively” ”. This subdivision 
seems coarser than the observed orientation tuning maps, and a more 
refined study of behaviours supported by the SC, along with further 
exploration of functional architecture in response to diverse stimuli” 
across the full retinotopic map, will help in understanding the func- 
tional significance of these columns in the SC. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Mice. Experiments were conducted on adult C57BL6/J mice of both sexes (ages 
2-8 months, Jackson labs). All procedures were performed in accordance with in- 
stitutional guidelines. Mice were first anaesthetized with ketamine, xylazine, and 
acepromazine (60 mg per kg, 7.5 mg per kg, and 3 mg perkg, respectively) and placed 
ina stereotaxic device with eyes covered with ophthalmic ointment. A custom head 
plate (titanium, 1 mm thickness, eMachineShop) was bonded to the skull (ESPE 
Adper Scotchbond, 3M), roughly centred on lambda, parallel to the long axis of the 
mouse and ata pitch of 15 + 5°. In some mice, viral injections were made through a 
small burr hole drilled over the rostral tip of the SC (injection coordinates 0.2- 
0.4mm caudal to interaural line, 0.3-0.7 mm lateral). A glass pipette (25-35 um 
tip) loaded with a 2:1 mixture of AAV2/1.hSyn1.GCaMP6s.WPRE.SV40 (Penn 
Vector Core) and 20% mannitol in saline was advanced into the tissue and a hy- 
draulic injector was used to inject 90-270 nl over several minutes. The pipette was 
left in place for 3 min before progressing to the next depth. Injections were made at 
three depths, typically 1.3, 1.15, and 1 mm below lambda. The pipette was then 
slowly removed and the hole overlaid with Kwik-Cast silicone elastomer (World 
Precision Instruments). Animals were given buprenorphine and carprofen (0.1 mg 
per kg, 5 mg per kg, respectively) for 48 h post-operatively. 

Previous intrinsic signal imaging studies of the SC entailed ablation of the over- 
lying visual cortex (VC)*"”’. This procedure destroys strong reciprocal connections 
between the SC and VC that likely influence response tuning, a potentially serious 
caveat. We noticed that the posteromedial SC is not obscured by cortex and might 
be accessible for imaging without removal of VC. This wedge of the SC lies beneath 
the confluence of the superior sagittal and transverse sinuses (Fig. 1a)**. Blood ab- 
sorbs infrared and visible light, and these vessels form an opaque barrier that sty- 
mied initial imaging attempts. Ablation of the transverse sinus in humans with 
arteriovenous malformations typically causes minimal or no neurological symp- 
toms™, indicating that it is not essential for healthy brain function. Moreover, its 
location within the dura and consequent lack of physical attachment to the brain 
surface suggested the possibility of dislodging it without damaging the underlying 
SC. Several previous studies used ‘plugs’ of glass or transparent silicone to apply 
pressure normal to the brain surface to flatten the tissue and minimize motion 
artefacts'®**; we reasoned that triangular plugs could be used to apply pressure par- 
allel to the brain surface, in a fashion analogous to a snowplow, to anteriorly dis- 
place the transverse sinuses (Fig. 1b, c and Extended Data Fig. la—c). With this 
approach, we were able to implant acute or chronic imaging windows and expose 
triangular patches of the SC typically 800-1,000 1m on a side, corresponding to 
~15-25% of the surface area of the SC. We have successfully imaged the SC with 
this preparation from 30 min to 6 months after plug implantation. 

Five days to 3 months after implantation of head plates, mice were given dex- 
amethasone (2 mg per kg), anaesthetized with isoflurane, and immobilized via their 
head plates in a custom holder. In mice previously injected with AAV, a 2-3 mm 
craniotomy was made over the SC, inferior colliculus, and part of the cerebellum, 
and a large flap was opened in the dura with a 30-gauge needle. The tissue was kept 
moist under artificial cerebrospinal fluid (ACSF). ACSF was wicked from the cra- 
niotomy to leave only a thin film of liquid over the SC and a small drop of uncured 
Kwik-Sil was applied to the SC. A plug bonded to a 5 mm circular coverslip was 
mounted on a suction cup and positioned over the craniotomy. The plug was quickly 
advanced downward into the uncured drop of silicone and then anteriorly to dis- 
place the dura and transverse sinuses. Cyanoacrylate (Vetbond, 3M) was applied 
to bond the coverslip to the skull and headplate. After a few minutes, suction was 
released and the suction cup withdrawn. Black dental cement (Ortho-Jet, Lang Dental) 
was then applied over the cyanoacrylate on the skull, head plate, and edges of the 
cranial window and allowed to set for ~30 min. For mice that had not been injected 
with AAV, a similar craniotomy was performed over the SC as well as VC, virus was 
injected into the SC and VC as previously described, and a plug attached to an 
8mm coverslip was implanted as above. 

To aspirate cortex, a craniotomy was performed over the SC and VC. Cortex 
was slowly aspirated and the transverse sinus was severed and reflected. Once 
bleeding had ceased, the craniotomy was filled with uncured Kwik-Sil and a cover- 
slip was pressed in place above and bonded as previously described. 

At least 3 days after implanting cranial windows, mice were habituated to hand- 
ling and head-fixing for at least 3 days before experiments began; mice were given 
at least 7 days after injection to permit GCaMP expression, and indistinguishable 
results were obtained 7 days to 3 months after AAV injection. Mice were head-fixed 
ona 12cm diameter circular treadmill (Ware Flying Saucer). The underside of the 
wheel was painted with alternating stripes of black and silver and illuminated with 
940 nm light-emitting diodes (LEDs). A pair of photodiodes measured reflectance 
of the stripes and thus encoded wheel motion for steps >5 mm. Analogue signals 
were recorded and analysed in Matlab. 

Plugs. Uncured Kwik-Sil (World Precision Instruments) was pressed between two 
~2.5 cm square blocks of acrylic (previously sterilized with 70% ethanol) separated 


LETTER 


by 0.75 mm shim stock. The silicone was allowed to cure for at least 15 min and 
transferred toa sterile Petri dish. A scalpel was used to cut triangular prisms roughly 
1mm tall and 1.5 mm wide. Care was taken to avoid use of any portion of the sil- 
icone sheet containing bubbles or lint. Surfaces of the silicone plugs were cleaned 
with transparent adhesive tape to remove dust. A corona treater (Electrotechnic 
products) was used to activate the surfaces of a silicone plug and a coverslip (5 or 
8 mm, number 1 thickness, Warner) and the plug was placed on the coverslip with 
the activated surfaces touching. Plugs were placed ina sterile Petri dish and bonding 
was allowed to proceed overnight in a hybridization oven at 60-70 °C. To implant 
plugs, small suction cups were fabricated by bevelling a 25-gauge needle to ~45° 
and mounting the needle with the aperture facing downward on a micromanipu- 
lator with a 20 ml syringe attached through flexible tubing. A small drop (~1 mm 
diameter) of ACSF was set on a clean block of acrylic and the tip of the syringe was 
placed in the drop. Kwik-sil was applied over the tip and allowed to set for at least 
15 min before use. 

Two-photon microscope. Two-photon imaging was performed on a custom-built 
microscope controlled by software written in Labview (National Instruments). A 
mode-locked Ti:sapphire laser (Mai-Tai DeepSee, Newport) with group delay dis- 
persion compensation was scanned by galvanometers (Cambridge) through a 20 1 
NA water-immersion objective (Olympus). GCaMP6s was excited at 920 nm and 
laser power at the sample plane was typically 15-50 mW. Imaging in the SC was 
performed 50 um to 300 jim below the surface, and imaging in layer 2/3 of VC was 
performed 150 jum to 300 um below the surface. A 300 X 150 jm field of view was 
scanned at 8 Hz as a series of 300 X 150 pixel images. Emitted light was collected 
with a T600/200dcrb dichroic (Chroma) and a 610dxcr dichroic (Chroma) to split 
green and red light (no red fluorophore was used in this study); green light passed 
through a HQ600/200M-2P bandpass filter (Chroma) and was detected by a mul- 
tialkali photomultiplier tube (R3896, Hamamatsu). Artefacts of the strobed stimulus 
were eliminated by discarding 10 pixels on either end of each line to yield 280 < 150 
pixel images. 

Intrinsic imaging microscope. Reflectance of a 735 nm LED (Thorlabs) was col- 
lected using a CCD camera (Flea3, Point Grey), through a5 X 0.14 NA air objective 
(Mitutoyo) used as a 2.5 objective with a short (f= 100 mm) tube lens. Images of 
640 X 480 pixels at 8-bit resolution were acquired at 114 or 120 Hz and binned to 
6 Hz, 160 X 120 pixels. Acquisition and analysis used custom software written in 
Labview and Matlab (Mathworks). 

Visual stimuli. Stimuli were generated in Psychtoolbox3 (Matlab) and presented 
on an LCD screen (Dell, U2312HM) centred 23 cm away from the mouse’s eye, 
angled at 20° in pitch and yaw to minimize fisheye distortion. Stimuli were presented 
on a square (1,080 X 1,080 pixel) region of the screen. Between experiments the 
monitor was maintained at a constant background grey level. To minimize inter- 
ference of the stimulus with fluorescence detection, the monitor was strobed for 
2 tts at the end of each scan line (1,200 Hz; luminance of grey screen ~1.25 cd m’, 
maximum brightness ~ 1/80 of unmodified monitor). Mice see red poorly, and all 
stimuli presented used only the green and blue channels of the monitor. The red 
channel was used to convey stimulus timing to synchronize with fluorescence ac- 
quisition; red bars flickered periodically at the bottom of the screen, which was 
covered with black tape. Drifting bars were 40 pixels wide (2-3°) and drifted at a 
speed of 240 pixels per s (12-18° s_'). Flashed spots were presented as a 10 X 10 grid 
of 5-8° black squares. Each spot appeared for 500 ms and was followed by 500 ms 
of grey screen. All stimuli in two-photon experiments were presented in a pseu- 
dorandom sequence with interspersed blank periods (sampling with replacement) 
within each stimulus block (typically 6-8 blocks per experiment), with a different 
random seed for each block. 

For intrinsic imaging, the same monitor was used. Due to the larger number of 
repetitions required, in some experiments the drifting bar stimuli were presented 
with interspersed blank frames omitted. To map the projective fields of slit gratings, 
a square wave grating with 100% contrast, spatial frequency of 0.5-0.8 cycles per 
degree (cpd), and temporal frequency of 1 Hz, switching direction after each cycle, 
was presented for 8 s through an aperture of the same dimensions as the drifting 
bars. To map the projective fields of grating patches, the same square grating was 
presented through a square aperture (10-15°) that alternated between two abut- 
ting locations every 8 s, changing to a randomly chosen (with replacement) car- 
dinal direction every 1s. 

Calcium imaging analysis. Brain motion during imaging was corrected using 
TurboReg (ImageJ) or software written in Python”. Elliptical regions of interest 
(ROIs) were drawn manually in Matlab and fluorescence traces extracted and neu- 
ropil signals subtracted. Neuropil tended to share the orientation tuning of the em- 
bedded cells (Extended Data Fig. 5). Because much of the neuropil derives from 
processes of the local cells, if local cells are tuned alike, the surrounding neuropil 
will show similar tuning, as in orientation columns in cat visual cortex**”. Thus, this 
observation was consistent with the single-cell data, but also presented a potential 
experimental confound, because out-of-focus fluorescence from neuropil leaks into 
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the signals recorded from individual cells***? (Extended Data Fig. 5b), and contam- 
ination by a tuned neuropil signal could bias measurements of a cell’s true preferred 
orientation. The true fluorescence signal of a neuron is c = r — (f X n), with r the 
raw fluorescence signal of the ROI containing the cell, f the out-of-focus neuropil 
contamination factor, and n the fluorescence signal of the surrounding neuropil. 
To estimate the extent of this contamination, small non-vertical blood vessels were 
identified, as in previous studies”, and cell-sized ‘holes’ that likely corresponded to 
uninfected cells. The ratio of their brightness and that of the surrounding neuropil 
was measured, and this routinely yielded estimates of f= ~0.5, a value similar to 
that obtained by another group using an objective with NA 1 (ref. 39). Asa result, 
all data presented in this study were processed using this value off. A shell of radius 
20 um centred on each cell was taken from the masked image, excluding all cells, 
and used to calculate the corrected signal for neuropil contamination. Cells with 
bright, filled nuclei, known to have slower kinetics and/or aberrant responses”, 
were excluded from analysis but included in masks for neuropil subtraction’®”°. 
Nevertheless, the true value of f may vary slightly within a field of view, and to 
ensure that the central results were robust to this variation we repeated the analysis 
with values of f from 0.3 to 0.9. For each value of fin this range, neurons were much 
more similar to the surrounding neuropil and each other than chance (Extended 
Data Fig. 5c, d), confirming that cells are indeed tuned similarly to their neighbours 
and the surrounding neuropil. 

On occasion, frames with large-amplitude motion were not registered correctly. 
These frames can be easily identified because activity should cause only increases 
in the fluorescence signal for an ROI, while movement of a cell into or out of an 
ROI is likely to be associated with either increases or decreases in fluorescence”. 
To detect these events, we estimated the baseline fluorescence for each ROI as the 
mean of the lower half of its fluorescence intensity over the course of the movie, 
and discarded frames in which the fluorescence of any ROI was more than 3 s.d. 
below baseline. Slow baseline fluctuations were removed by subtracting the eighth 
percentile value from a moving window 15s wide centred on each frame”; the 
mean value of the eighth percentile value for the entire trace was then added back 
to allow measurement of fold changes. The response to each direction (AF/F) was 
measured as (F/Fy)—1, with F the instantaneous ROI intensity and Fy the mean 
fluorescence intensity during stimulus blanks (grey screen). AF/F values for each 
presentation of a stimulus were averaged and the peak AF/F for each direction was 
used to compute orientation preference; similar results were obtained using the 
mean AF/F for each ROI. Responses to directions separated by 180° were summed 
to compute orientation preference. The orientation selectivity index (OSI) was cal- 
culated as (Rpret — Rortho)/(Rpret + Rortho), With Rp;erthe AF/F to the orientation elic- 
iting the strongest response and R,,:,. the AF/F to the orthogonal orientation’. 
Cells with OSI = 0.15 (~4:3 preferred:null response) were classified as orientation- 
selective. The preferred orientation was defined as the weighted vector sum over the 
range of presented stimulus angles. 

To map receptive fields with flashed spots, stimuli were presented in rapid suc- 
cession, necessitating a faster baseline filter (3 s) to compensate for the slow decay 
kinetics of the GCaMP6s. The consensus centre square for each field of view, aver- 
aging all cell bodies and neuropil, was identified and the mean responses to spots 
on the periphery, well outside the receptive fields, and blanks were averaged to com- 
pute baseline fluorescence Fy for each ROI. The response AF/F was computed as 
(F/Fo)—1, with F representing the peak of the averaged response of all presentations 
for each spot location. To compare receptive field overlap, preferred orientation for 
each cell was binned to 0, 45, 90, or 135°, according to the presented orientation 
eliciting the strongest response. Next, the number of cells preferring each orienta- 
tion was determined, and the orientations preferred by the largest fraction of cells 
(orientation 1) and second-largest fraction of cells (orientation 2) were analysed. 
The peak square location for each cell preferring orientation 1 or orientation 2 was 
defined as the location that elicited the maximal responses from the plurality of 


cells sharing each preferred orientation. A two-dimensional Gaussian was fit to 
each orientation-tuned cell’s responses to the flashed spots and the receptive field 
size was set as the area of an ellipse of radius of one s.d., using a spot area of 7.5°. 
Intrinsic imaging analysis. No temporal or spatial filtering of intrinsic imaging 
data was performed. To map orientation columns, mean reflectance changes while 
bars drifted in both directions were summed for each cardinal axis. An ROI was 
drawn manually over the SC to exclude signals from the large surrounding blood 
vessels, and the ratio of responses to the vertical and horizontal axes was determined. 
To measure the projective fields of slit gratings, the ratio of the mean reflectance to 
a given grating position was taken to the mean reflectance when orthogonal bars 
were presented across the SC. To map projective fields with grating patches, the 
ratio of the mean reflectance when the stimulus was at either of the two locations 
was determined. Measurements of projective field peak-to-peak distances were 
made in Image] by drawing a rectangular ROI over each projective field, roughly 
orthogonal to the bar, and measuring the averaged line profile over the ROI. These 
line profiles were smoothed and full-width at half-maximum was measured in 
Matlab. To measure inhomogeneity indices, pixels were classified as preferring 
horizontal or vertical if they were in the upper 40% of responses to either orienta- 
tion, with the remaining pixels classified as untuned. A window of indicated size 
was rastered across the orientation map. At any position in which more than half 
of the pixels were within the SC mask, the inhomogeneity index was measured as, 
abs (h — v)/(h + v + u) with h and v the number of pixels preferring horizontal 
and vertical bars, respectively, and u the number of untuned pixels. To generate 
the image in Fig. 5i, lines were manually fit to the projections of slit gratings (as in 
Fig. 5a—c) on the surface of the SC. The resulting grid was then aligned to a grid of 
the positions of the stimuli in the visual field using a thin plate spline in Matlab, 
and the same transform applied to the orientation map from that animal. 
Statistical methods. No statistical method was used to predetermine sample size. 
Statistical comparisons were performed as Kruskal-Wallis tests, a non-parametric 
test that extends the Mann-Whitney U-test to multiple groups, or as Monte Carlo 
simulations. These tests do not assume normality of the data; nevertheless, all com- 
parisons described yielded similar P values with statistical methods that assume 
normality (for example, one-way ANOVA). 
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Extended Data Figure 1 | Plugs and head plates used for this study. a, Top _ stimuli were presented in a square area on the right side of the rectangular 
view of triangular silicone plug attached to a 5 mm coverslip. b, Side view of the — monitor. f, g, Azimuth and elevation in degrees for each pixel within the 


plug from a. c, Suction cup used to position plug. d, Standard headplate inscribed square area of the monitor on which stimuli were displayed. 
with 8 mm aperture. e, Monitor tilt used to reduce fisheye distortion. Values are given with respect to standard stereotaxic coordinates; because 
Perspective is exaggerated for clarity. The monitor was tilted such that it was _ headplates were implanted at ~15° angles with respect to this plane, bars 
20° from vertical, with the top nearer the animal, and 20° from the animal’s on screen are tilted. Curvature of elevation bars reflects the fact that iso- 
anterior—posterior axis, with the right edge closer to the animal. Note that elevation lines are curved, not straight, much like latitude lines on a globe. 
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Extended Data Figure 2 | Orientation tuning does not reflect distortion 
effects of flat screen. a, Plot of foot point of monitor (FP), the point at which a 
line from the eye is perpendicular to the tangent screen, and all mapped 
receptive field centres for orientation-tuned SC neurons in one animal. 

Each line segment is positioned at a cell’s receptive field centre and angled to 
reflect its preferred orientation. Orange line indicates the radial orientation 
relative to the FP for an example cell. Fisheye distortion will cause bars along the 
radial orientation relative to the foot point to appear relatively wider and faster 
than orthogonal bars along the tangential orientation, potentially biasing 
responses towards or away from the radial orientation. b, Enlarged view of inset 
area in a to show orientations more clearly. Note sharp transition in preferred 
orientation from bottom to top of panel relative to difference in radial 
orientation. Also note that the preferred orientation and the radial orientation 
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vary with opposite handedness. c, Difference between radial orientation and 
preferred orientation for all cells in the plot. If the orientation map were due 
to fisheye distortion, preferred orientations should be similar to the radial 
orientation and the distribution should be centred at 0. Note that this 
distribution is biased away from 0 and centred between 45 and 90°. d-f, As in 
a-c for another animal. Note sharp transition in preferred orientations as in 
b, but with opposite handedness, and centring of distribution of preferred 
orientations between 0 and 45°. g-i, As in a—c for another animal. Note 
sharp transition in preferred orientations and bias of cells to orientations 
orthogonal to radial orientation. j-l, As in a—c for another animal. Note 
sharp transition in this field and a group of cells whose preferred orientations 
are close to radial. 
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Extended Data Figure 3 | Sample responses of neurons in V1. a,b, Average AF/F + s.d. of two V1 neurons to 7 repetitions each of 8 directions of bar motion and 
a blank screen. Insets are polar plots for each cell. 
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Extended Data Figure 4 | Monte Carlo simulation reveals significance of 
observed local similarity in the SC. a, Absolute value of the difference in 
preferred orientations plotted against horizontal distance in the SC. Orange line 
indicates linear fit to the difference in preferred orientation as a function of 
horizontal separation, yielding a line of best fit with a slope of +22° per 100 jim. 
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b, As in a after shuffling all cell positions. A total of 10° independent shuffles 
were performed; shown are results from the final shuffle. This yielded a 
distribution of slopes with a mean + s.d. of (0 + 1°) per 100 tm. c, Histogram of 
slopes of best-fit lines from 10° independent Monte Carlo simulations. 
Arrowhead indicates slope from a. d-f, As in a-c for data from V1. 
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Extended Data Figure 5 | Orientation tuning of neuropil. a, Schematic of Each black line reflects median values from a single image volume; orange line 
cells and surrounding neuropil shell. Signals are extracted from each cell is median value for all cells from 7 volumes. d, Effects are robust over a range 
and from the neuropil within 20 um. The corrected signal of a cell c within an _ of f values. Plotted are mean differences in preferred orientations against 
ROI risc = r-(fX n), with n the signal of the neuropil shell and fthe fractional distance + s.e.m as in Fig. 2d, from which the orange trace is reproduced. 


contamination by out-of-focus neuropil. b, Orientation preferences for Because the neuropil is also sharply tuned, using high values of f will reduce 
neuropil shells of orientation-tuned cells in Fig. 2a. c, Difference in preferred _ the apparent similarity of neighbouring cells’ orientation preferences. 
orientation of neurons and their neuropil shells over a range of values of Nonetheless the similarity remains significant even at the excessively high f 
neuropil subtraction coefficient f. Dashed horizontal line indicates chance. value of 0.9. 
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Extended Data Figure 6 | Distribution of preferred orientations in the SC. a, Preferred orientations of cells in the SC according to the presented orientation 
eliciting the strongest response. b, Preferred orientations of cells in the SC calculated from vector sums of responses to all orientations. 
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Extended Data Figure 7 | Orientation maps do not reflect fisheye distortion. 
a, Grating patches were presented at the foot point (FP), the point at which a 
line from the eye is perpendicular to the tangent screen, and at locations on the 
screen displaced from the FP along radial orientations of 0, 45, and 90°. 

b, Predicted map if fisheye distortion caused orientation tuning to be biased 
towards the radial orientation with respect to the FP. c, Projective field of foot 
point (black, indicated with white star) and patch at 0° (directly lateral on 
screen) from foot point (white patch, black star). d, As in c for patches at 90 and 
45° with respect to foot point. e, Orientation map for this animal. Blue areas 
prefer horizontal bars, red areas prefer vertical bars, and arrows indicate lines 
from projective field of foot point to projective fields of patches. f, Expected 
orientation map according to distortion hypothesis. Note that the area at the 
projective field of the foot point should be untuned, and a line from the 
projective fields of the FP and a spot located at 0° relative elevation should 
pass from untuned areas to progressively more horizontal-preferring areas. 
Instead, it passes from a horizontal-preferring area at the FP to vertical- 
preferring areas as it moves to greater eccentricity. Trajectories along other 
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45 


projections of radial orientations (45 and 90°) are similarly poor fits to 
prediction. g, h, As in b and ¢ for another animal. Orientation map for this 
animal in o also does not match prediction of fisheye distortion hypothesis. 
Arrow indicates shadow of blood vessel. i, j, Checkerboard pattern before 
and after ‘pre-distortion’ to offset fisheye effect. This pre-distortion was applied 
to change bar width by 1/cosine(8), with 0 the eccentricity from the FP, for both 
vertical and horizontal bar stimuli. k, 1, Orientation maps for an animal in 
response to standard (k) and ‘pre-distorted’ (I) bar stimuli. In this animal the 
transverse sinus was not fully retracted and partially obscures the field of view. 
Note similarity of patterns in k and 1. m, n, As in k, |, for the animal in 

c-e imaged on a different day. Comparison of maps in e and m reveals inter- 
trial variability, which is comparable to variability between standard and 
pre-distorted stimuli (m and n). 0, p, As in k, | for a third animal. Map in 0 is 
overlaid with projective fields of points in visual field as in e. The reflectance 
change AR/R from black to white is 12 x 10 * (c), 19 X 10 * (d), and 

18 X 10°“ (g, h). The reflectance change AR/R from red to blue is 4 X 10° * 
(e,k, 0), 5X 10 * (1, p), 7X 10 *(m), and 9 x 10 “ (n). 
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Extended Data Figure 8 | Alternative mapping stimulus reveals similar Responses to the two gratings span small patches on the surface of the SC. 
projective fields. a, Stimulus. Square grating patches alternate between two The reflectance change AR/R from black to white is 2 x 10°. ¢, As in b for 
adjacent locations every 8 s. At each location the grating switches orientation animal from Fig. 4c. The reflectance change AR/R from black to white is 
randomly at 1 Hz. b, Map of responses elicited in animal from Fig. 4a. 2x10%. 
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Mechanosensory interactions drive collective 


behaviour in Drosophila 
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Collective behaviour enhances environmental sensing and decision- 
making in groups of animals’”. Experimental and theoretical inves- 
tigations of schooling fish, flocking birds and human crowds have 
demonstrated that simple interactions between individuals can ex- 
plain emergent group dynamics“. These findings indicate the exist- 
ence of neural circuits that support distributed behaviours, but the 
molecular and cellular identities of relevant sensory pathways are 
unknown. Here we show that Drosophila melanogaster exhibits col- 
lective responses to an aversive odour: individual flies weakly avoid 
the stimulus, but groups show enhanced escape reactions. Using high- 
resolution behavioural tracking, computational simulations, gen- 
etic perturbations, neural silencing and optogenetic activation we 
demonstrate that this collective odour avoidance arises from cas- 
cades of appendage touch interactions between pairs of flies. Inter- 
fly touch sensing and collective behaviour require the activity of 
distal leg mechanosensory sensilla neurons and the mechanosensory 
channel NOMPC**. Remarkably, through these inter-fly encoun- 
ters, wild-type flies can elicit avoidance behaviour in mutant animals 
that cannot sense the odour—a basic form of communication. Our 
data highlight the unexpected importance of social context in the 
sensory responses of a solitary species and open the door to a neural- 
circuit-level understanding of collective behaviour in animal groups. 

Drosophila melanogaster is classified as a solitary species’ but flies 
aggregate at high densities (>1 fly per cm’) to feed® (Extended Data 
Fig. 1a, b and Supplementary Video 1), providing opportunities for 
collective interactions. Although groups affect circadian rhythms’ and 
dispersal’® in Drosophila, how social context influences individual 
sensory behaviours is unknown. To study this question, we developed 
an automated behavioural assay to track responses of freely-walking 
flies to laminar flow of air or an aversive odorant, 5% carbon dioxide 
(CO )'*?. Odour was presented to one half of a planar arena for 2 min 
(Fig. la and Extended Data Fig. 1c, d). Avoidance behaviour was 
quantified as the percentage of time a fly spent in the air zone during 
the second minute ofa trial (Fig. 1b, c). Unexpectedly, isolated flies spent 
very little time avoiding this odour (Fig. 1d), despite the aversion to CO, 
observed in other assays''’”. However, increasing the number of flies 
was associated with substantial increases in odour avoidance (Fig. 1d 
and Extended Data Fig. le). This effect peaked at 1.13 flies per cm’, a 
density typical for fly aggregates (Extended Data Fig. 1b) and was only 
apparent for flies in the odour zone (Fig. le and Extended Data Fig. 1). 
Time-course analysis revealed that, within only a few seconds after 
odour onset, a larger proportion of flies in high-density groups had 
left the odour zone compared to isolated individuals (Fig. 1f; compar- 
ing 0.06 against 1.13 flies per cm’, P<0.05 for a Mann-Whitney 
U-test from 0.6 s onwards). Additionally, the motion of flies after odour 
onset was coherent at higher densities, with flies moving in the same 
direction, out of the odour zone; this effect was not observed for flies in 
the air zone (Extended Data Fig. 1g, h). 

To determine the basis of these global behavioural differences, we 
examined the locomotion of individual flies. Single animals are typically 


sedentary but walk more when exposed to CO, (Extended Data Fig. 2a, b). 
In groups, however, we discovered that 63% of the time, the first walking 
response of a fly after odour onset coincided with proximity to a neigh- 
bouring fly (an ‘Encounter’: distance to a neighbouring fly < 25% body 
length; Fig. 2a—c and Supplementary Video 2). These Encounters were 
more frequent with increasing group density (Fig. 2d). Moreover, walking 
bouts (velocity > 1mm s ') initiated during an Encounter (“Encounter 
Responses’) were significantly longer than those spontaneously initiated 
in isolation (Fig. 2e). These observations indicated that inter-fly interac- 
tions might contribute to the enhanced odour avoidance of groups of flies. 

We examined this possibility initially by computational simulation 
of the olfactory assay. The dynamics of our simulation were driven by 
three phenomena observed in behavioural assays (Fig. 2f). First, flies 
initiate more spontaneous bouts of walking in odour than in air (Extended 
Data Fig. 2a, b). Second, flies are more likely to turn and retreat after 
entering the odour zone from the air zone (Extended Data Fig. 2c). 
Third, close proximity to another fly elicits Encounter Responses in 
stationary flies (Fig. 2e and Extended Data Fig. 2d). Importantly, these 
elements could reproduce collective behaviour: higher numbers of 
simulated flies exhibited greater avoidance (Fig. 2g). While changing 
the olfactory parameters preserved stronger responses in groups than 
isolated individuals (Extended Data Fig. 2e-h), diminishing the Encoun- 
ter Response probability could abolish and even reverse collective be- 
haviour (Fig. 2h). These results suggested that Encounter Responses 
are a crucial component of Drosophila group dynamics. 

To experimentally test the role of inter-fly interactions in collective 
behaviour, we sought to explain the mechanistic basis of Encounter 
Responses. Although our olfactory experiments were performed in the 
dark (Fig. 3a), the presence of light did not diminish Encounter Re- 
sponse frequency (Fig. 3a). Volatile chemicals are known modulators 
of many social behaviours'*“, but putative anosmic flies (lacking known 
olfactory co-receptors) did not reduce Encounter Responses (Fig. 3a). 
By contrast, disruption of the mechanosensory channel NOMPC” signi- 
ficantly diminished Encounter Response frequency (Fig. 3a). These data 
suggested that mechanosensing is required for Encounter Responses. 

By observing groups of flies at high spatiotemporal resolution, we 
found that active flies elicited motion in stationary animals through 
gentle touch of peripheral appendages (legs and wings; Fig. 3b and 
Supplementary Video 3). Leg touches took place exclusively on distal 
segments (Fig. 3b, inset) and resulted in spatially stereotyped walking 
reactions (Fig. 3c). These reactions were kinematically indistinguish- 
able from Encounter Responses (compare Extended Data Fig. 3c and e; 
two-sample Kolmogorov—Smirnov test, P = 0.07; see Methods). This 
analysis indicates that appendage touch is the stimulus that elicits 
Encounter Responses. The precise stereotypy of these locomotor res- 
ponses, similar to cockroach escape reactions’’, implies their depend- 
ence upon somatotopic neural circuits linking touch with movement. 

As fly appendages also house taste receptors’®, we tested whether 
mechanical stimulation was sufficient to elicit Encounter Responses by 
tracking stationary flies following touch of appendages with a metallic 
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Figure 1 | Collective odour avoidance in Drosophila. a, Image of flies 
(triangles) and their trajectories (dashed lines) during 2s in a two-choice 
olfactory assay. 5% CO, (‘Odour’) flows through the right half while air flows 
through the left half. Two densities of flies are shown (0.06 and 1.50 flies per 
cm). The scale bar is 2.5 mm. b, Schematic of the odour avoidance experiment. 
Flies in the odour zone at stimulus onset (ft = 0) are measured for the time 
spent in the non-odour zone during the second minute of the experiment 
(‘Time avoiding odour (%)’). ¢, Flies (white triangles) with a low (top) or high 
(bottom) per cent time avoiding odour. d, e, The per cent time avoiding the 
odour (mean and s.d.) for five different densities of flies starting in the odour 
zone (black bars) (d) and four densities of flies starting in the air zone (grey 
bars) (e). n = 37, 38, 36, 35, and 38 experiments for 0.06, 0.38, 0.75, 1.13, and 1.5 
flies per cm” respectively. In this and all subsequent figures, unless otherwise 
stated, a single asterisk (*) denotes P < 0.05 and a double asterisk (**) 
denotes P< 0.01 for a Bonferroni-corrected paired sample t-test (bar plot 
comparisons) or a Mann-Whitney U-test (boxplot comparisons). f, The 
proportion of flies outside of the odour zone over the entire experiment. The 
mean (solid line) and s.e.m. (transparency) are colour-coded for each density 
(n is as for panels d, e). 


disc (Supplementary Video 4). We observed a stereotyped relationship 
between the location of mechanical touch and subsequent walking 
trajectories (Fig. 3d), whose associated kinematics were indistinguish- 
able from those of Encounter Responses. Thus, mechanical touch alone 
can elicit Encounter Responses (compare Extended Data Fig. 3c and g; 
two-sample Kolmogorov-Smirnov test, P = 0.3). Consistently, genetic 
ablation of flies’ oenocytes, to remove cuticular hydrocarbon contact 
chemosensory signals’’, had no effect on the ability of these animals to 
elicit Encounter Responses in wild-type flies (Fig. 3e). These data imply 
that Encounter Responses are mediated solely by mechanosensory 
stimulation. 

We next identified mechanosensory neurons required for touch- 
evoked Encounter Responses by driving tetanus toxin (Tnt) expression 
with a panel of candidate mechanosensory Gal4 lines (Extended Data 
Fig. 4a). R55B01-Gal4/UAS-Tnt flies exhibited significantly diminished 
Encounter Responses compared to a gustatory neuron driver line (Ex- 
tended Data Table 2), without reduced ability to produce sustained 
high-velocity walking bouts (Extended Data Fig. 4b). R55B01-Gal4- 
driven expression of a UAS-CD4:tdGFP reporter was detected in neurons 
innervating leg and wing neuropils of the thoracic ganglia (Extended 
Data Fig. 5a). Consistently, green fluorescent protein (GFP) labelled 
neurons in several leg mechanosensory structures: the femoral and tibial 
chordotonal organs, and distal leg mechanosensory sensilla neurons 
(Extended Data Fig. 5b). Notably, among the screened lines only R55B01- 
Gal4 drove expression in leg mechanosensory sensilla (Extended Data 
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Figure 2 | Inter-fly Encounters coincide with odour responses and are 
required for collective odour avoidance in simulations. a, Images of two flies 
(left, white circles) undergoing an Encounter (middle, red circles) that results in 
an Encounter Response (right, blue circle). b, Velocities and Encounters for 
two flies exposed to CO; at odour onset (grey dashed line). The top panel shows 
the velocity for each fly. Cyan arrowheads indicate the first walking bout 
initiated after odour onset (‘Odour walking response’). The bottom panel 
shows when these flies are (white) or are not (black) undergoing an Encounter 
during the same time period. c, The likelihood of an Encounter with respect to 
the time of the odour walking response (blue line) or a randomly chosen 
time point (grey line). Data are from Fig. 1d; density = 1.13 flies per cm? and 
n = 200 flies. d, The frequency of Encounters as a function of group density. 
Data are from Fig. 1d. e, The duration of walking bouts depending on whether 
they are initiated in isolation (grey boxes) or during an Encounter (white 
boxes). Data are from Fig. 1d. f, Simulated flies moved through a virtual arena as 
a function of three parameters: spontaneous bout probability (‘Bout’), 
Encounter Response probability (“Encounter Response’), and turn away 
probability from the air—odour interface (“Turn away’). Low (small grey arrows) 
or high (large black arrows) probabilities were experimentally determined 
(Extended Data Fig. 2). g, The per cent time avoiding the odour (mean and s.d.) 
for five densities of simulated flies (n = 80 experiments for each condition). 
h, The sensitivity of simulated odour avoidance to Encounter Response 
probabilities ranging from 0 (never responding to Encounters, blue) to 1 
(always responding, yellow). Each coloured line indicates the mean odour 
avoidance time (” = 10,902 experiments for each data point). The black line 
indicates Probability = 0.8, taken from real fly data in Fig. 1. Black circles 
indicate the mean fly avoidance times from Fig. 1d. 


Fig. 4c, d), suggesting that these are the critical neurons for Encounter 
Responses. 

To ascertain the contribution to Encounter Responses of leg mecha- 
nosensory sensilla and/or chordotonal structures (which can also sense 
touch'*””), we identified additional Gal4 driver lines that drove expres- 
sion in subsets of these neuron classes. By intersecting piezo-Gal4 with 
cha3-Gal80, a Gal4 suppression line, we could limit leg expression to 
mechanosensory sensilla neurons (termed “Mechanosensory Sensilla 
driver’ line) (Fig. 3f). Importantly, silencing neurons with this driver 
significantly diminished Encounter Response frequency (Fig. 3g). By 
contrast, silencing leg chordotonal organs alone had no effect on 
Encounter Response frequency (Extended Data Fig. 5a-c). 

We tested the sufficiency of leg mechanosensory sensilla neuron 
activity to elicit Encounter Response-like walking by expressing chan- 
nelrhodopsin-2 (ChR2) in each class of leg mechanosensory neurons 
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Figure 3 | Leg mechanosensory sensilla neuron activity is necessary and 
sufficient for Encounter Responses. a, The frequency of Encounter Responses 
measured from experiments in Fig. 1 C Dark’), illuminated experiments 
(‘Light’), near-anosmic mutants (IR8a’, IR25a’, GR63a’, ORCO’), ‘ese, 
proprioceptive mutants (nanchung*®), nociceptive touch mutants (piezo 

and gentle touch mutants (nompC”™"). To calculate the frequency of 
Encounter Responses, we tested how often each stationary fly undergoing 

an Encounter moved continuously for the next half-second. n = 10 
experiments for each condition (density = 0.75 flies per cm”). Reductions for 
nanchung and piezo mutants were not statistically significant (Extended Data 
Table 1). b, Single frames from a high-resolution video of an Encounter 
between a moving fly and a stationary fly. The schematic on the middle frame 
shows the per cent of all observed Encounter Responses resulting from 

touch for each leg segment (n = 104 experiments). The Encounter Response 
walking trajectory elicited by touch is shown in yellow on the right-hand frame. 
c, Encounter Response trajectories (right) colour-coded by the appendage 
touched by the neighbouring fly (left). Wings, W (n = 54 experiments); legs, 
R1-R3 and L1-L3 (n = 21, 18, 19 and 23, 15, 17 experiments, respectively). The 
scale bar is 1 mm and each trajectory represents up to 0.24 of walking. 

d, Touch response trajectories (right) colour-coded by which appendage was 
touched by a metallic disc (left). Wings, W (n = 20 experiments); legs, RI-R3 
and L1-L3 (n = 20, 21, 21 and 18, 21, 20 experiments, respectively). The scale 
bar is 2.5mm and each trajectory represents up to 1.5s of walking. e, The 
frequency of Encounter Responses elicited by moving flies with (‘oe+’) or 
without (‘oe—’) cuticular hydrocarbon-secreting oenocytes (n = 11 
experiments each). NS, not significant. f, A transmitted light image, inverted 
fluorescence image (fluorescence in black), and summed fluorescence 
(ZGFP) for a Mechanosensory Sensilla driver fly leg expressing GFP 
(UAS-CD4:tdGFP/piezo-Gal4;cha3-Gal80/+). Leg mechanosensory sensilla 
(MS) are indicated in green. A high-resolution image of the tarsus is shown on 
the right. Endogenous GFP fluorescence (green) is superimposed upon a 
transmitted light image (magenta). The scale bars are 100 jum. g, The frequency 
of Encounter Responses for parental line controls (UAS-Tnt/+;+ or piezo- 
Gal4/+ ;cha3-Gal80/+), Mechanosensory Sensilla driver flies expressing an 
inactive tetanus toxin control (UAS- Tn MP /piezo-Gal4;cha3-Gal80/+), or 
Mechanosensory Sensilla driver flies expressing tetanus toxin (UAS-Tnt/piezo- 
Gal4;cha3-Gal80/+). n = 12, 13, 15 and 15 experiments, respectively. h, Blue 
laser optogenetic stimulation responses of flies expressing ChR2 in 
mechanosensory sensilla (piezo-Gal4/+; cha3-Gal80/UAS-ChR2(T159C)) in 
the absence (left) or presence (right) of the essential cofactor all trans-retinal 
(n = 12 flies for each condition). Each box indicates the response for a single fly 
(‘walk’, ‘leg shift’ or ‘none’). 
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Figure 4 | Encounter Responses are necessary and sufficient for collective 
odour avoidance. a, b, The per cent time avoiding the odour (mean and s.d.) 
for R55B01-Gal4 (a) or Mechanosensory Sensilla driver (b) flies expressing 
an inactive tetanus toxin control, or tetanus toxin. n = 22, 21, 22, and 19 
experiments for R55B01-Gal4 and n = 23, 21, 21, and 21 experiments for the 
Mechanosensory Sensilla driver (genotypes: UAS-Tnt"”;R55B01-Gal4, UAS- 
Tnt;R55B01-Gal4, UAS-Tnt™” [piezo-Gal4;cha3-Gal80/+, UAS-Tnt/piezo- 
Gal4;cha3-Gal80/+). c, The per cent time avoiding the odour (mean and s.d.) 
for heterozygous control, or homozygous nompC””"* mutant animals. 

n= 22, 22, 21, and 21, respectively. d, e, The per cent time avoiding the 
odour (mean and s.d.) for individual CO, anosmic virtual and real flies 
(GR63a',IR64a™ 3753) Avoidance time is measured from a single ‘CO, 
anosmic’ fly per experiment in a simulated model (d, n = 80 experiments each), 
or in Drosophila (e, n = 35, 37, 40 and 38 experiments) where single mutant 
flies were tested for CO, avoidance in the context of wild-type flies. 


and recording behavioural responses to blue light pulses. Optogenetic 
stimulation of flies expressing ChR2 in leg mechanosensory sensilla 
neurons, but not chordotonal organs, resulted in Encounter Response- 
like walking (Fig. 3h; Extended Data Fig. 5d, Supplementary Videos 5 
and 6), consistent with natural elicitation of Encounter Responses by 
inter-fly touch of distal leg segments (Fig. 3b, inset). 

Our identification of a neuronal basis for Encounter Responses 
allowed us to test our model’s prediction (Fig. 2h) that inter-fly inter- 
actions are required for collective odour avoidance. First, we silenced 
leg mechanosensory sensilla neurons by expressing Tnt with R55B01- 
Gal4 or the Mechanosensory Sensilla driver. Second, we studied nompC 
mutants. Each of these perturbations abolished collective odour avoid- 
ance (Fig. 4a-c), supporting the link between mechanosensation and 
group behaviour. 

Touch may enhance odour avoidance by increasing awareness of 
the stimulus. Alternatively, touch may produce an odour-independent 
Encounter Response reaction that initiates departure from the odour 
zone. To distinguish between these possibilities, we asked if odour- 
insensitive flies displayed increased avoidance in the presence of 
odour-sensitive animals. Indeed, both in simulations (Fig. 4d) and in 
real flies (Fig. 4e), increasing the number of odour-sensitive indivi- 
duals led to greater avoidance behaviour of odour-insensitive indivi- 
duals. Thus, in this context, touch-mediated modulation of odour 
awareness plays little, if any, role in collective avoidance. 
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Combining systems-level and neurogenetic approaches, we have 
uncovered a hierarchy of mechanisms that drive collective motion in 
Drosophila. Active flies elicit spatially stereotyped walking responses 
in stationary flies through appendage touch interactions, requiring 
the NOMPC mechanosensory channel and distal leg mechanosensory 
sensilla neurons. Through Encounter Responses, odour reactions of 
sensitive flies spark cascades of directed locomotion of less sensitive (or 
even insensitive) individuals, causing a coherent departure from the 
odour zone. This behavioural positive feedback and group motion are 
absent among flies in the non-odour zone since they are less likely to 
initiate walking and, consequently, havea reduced frequency of Encoun- 
ters. Additionally, flies retreat when encountering the odour while trans- 
iting from the air zone. Together these behavioural phenomena cause 
flies to escape the odour zone and then remain in the air zone, resulting 
in higher odour avoidance for groups compared to isolated animals 
(Extended Data Fig. 6). When distal appendage mechanosensory touch 
detection is impaired, groups of flies cannot produce Encounter Re- 
sponses, are less likely leave the odour zone, and instead behave like 
isolated flies. Encounters are likely to have widespread influence on 
sensory-evoked actions of individuals in groups. For example, move- 
ment of flies towards areas of high elevation” is also increased in higher 
density groups (Extended Data Fig. 7). 

Behaviour in animal groups arises from the detection and response 
to intentional and unintentional signals of conspecifics. While neural 
circuits controlling pairwise interactions, such as courtship, are increas- 
ingly well-understood’, we know little about those orchestrating group- 
level behaviours. The identification of sensory pathways that mediate 
collective behaviour in Drosophila opens the possibility to understand 
the neural basis by which an individual’s actions may influence—and be 
influenced by—group dynamics. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Drosophila lines. Actin88F-eGFP (ref. 22) backcrossed 5 generations to w“” was 
used as the wild-type line enabling distinction from non-fluorescent mutant flies 
in Fluorescence Behavioural Imaging experiments (Fig. 3a,e,g and Fig. 4e and 
Extended Data Fig. 5c). 

GR63a’,IR64qM 80783 mutant flies were used as the CO2-anosmic individuals 
(Fig. 4e). 

IR8a!;IR25a7;GR63a',ORCO' quadruple mutant flies were used to measure the 
influence of olfaction on the frequency of Encounter Responses (Fig. 3a). 

nanchung™ (ref. 23), piezoX° (ref. 24) and nompC* 00914 (ref. 5) mutant flies were 
used to measure the impact of mechanosensing on the frequency of Encounter 
Responses (Fig. 3a). 

P{GMR-Gal4}attP2 transgenic flies*”° were used to identify neural populations 
with deficient touch responses (Fig. 3g, and Extended Data Figs 4,5). Gal4 drivers 
were selected by pre-screening a large panel (http://flweb.janelia.org/) for those 
displaying sparse expression in neurons that projected from the legs to the thoracic 
ganglia and neurons innervating the antennal mechanosensory and motor centre 
in the brain’’. To identify R55B01-Gal4, we compared the frequency of Encounter 
Responses in animals bearing these transgenes against that of a control driver, 
R27B07-Gal4, which drives a gustatory pattern of expression in the legs’” and 
Thoracic Ganglia (Extended Data Fig. 4a, green and data not shown). Brain 
expression in R55B01-Gal4 was limited to neurons projecting to the Antennal 
Mechanosensory and Motor Centre, and weaker expression in those innervating 
the suboesophageal zone (which receives both gustatory and mechanosensory 
input from the labellum) and several visual areas (optic lobes and optic tubercle; 
Extended Data Fig. 5a). These weakly marked neural populations are likely to 
contribute only minimally to Encounter Responses as labellar touch was never 
observed and all experiments were performed in the dark. Finally, we also observed 
fan-shaped body expression in R55B01-Gal4. However, inhibiting the fan-shaped 
body cannot explain Encounter Response reductions in R55B01-Gal4 since silen- 
cing these neurons alone has no effect on Encounter Response frequency (Extended 
Data Fig. 5c, R65C03-Gal4). 

piezo-Gal4;cha3-Gal80 flies were used to target mechanosensory sensilla neurons. 

UAS-Tnt and UAS-Tnt'™? flies were used to measure the effects of neural 
knockdown on Encounter Responses and collective behaviour (Figs 3g, 4a, b, 
and Extended Data Fig. 5c). 

UAST-ChR2(T159C) flies were generated by cloning ChR2(T159C) (ref. 28) 
into pgUASTgattB (refs 29, 30) and inserting this transgene into attP2 site (Genetic 
Services, Inc., Cambridge MA, USA). UAS-ChR2 flies were then crossed with Gal4 
driver lines for channelrhodopsin-2 stimulation experiments (Fig. 3h, and Extended 
Data Fig. 5d). 

PBacf{y[+mDint2] w[+mC] = UAS-CD4:tdGFP}VK00033 flies*' were used to 
visualise Gal4 driver expression in leg, brain, and thoracic ganglia neurons (Fig. 3f 
and Extended Data Figs 4d, e and 5a, b). 

PromE(800)-Gal4 [4M],Tub:Gal80ts flies, UAS-StingerII, UAS-Hid/CyO flies 
and UAS-StingerII flies were used for oenocyte ablation experiments (Fig. 3e) as 
described previously”. 

Experimental and statistical conditions. Experiment sample sizes were chosen 
based on preliminary studies. If sample size constraints and proper experimental 
conditions were met, all experiments were included for subsequent analysis. 
Experiments for different conditions and genotypes were interleaved to minimise 
the effects of time-of-day on behavioural results. Owing to the automated nature of 
almost all data acquisition and analysis, the experimenter was not blinded. For 
data meeting the criteria of normality, bar plots are presented and parametric 
statistical tests were used. For other data, boxplots and non-parametric statistics 
were used. Groups with similar variance are compared throughout the study. 
Arena design and flow simulation. Arenas were designed using the 3D CAD 
software, SolidWorks (Dassault Systemes, Waltham, Massachusetts, USA) and 
CNC machined from polyoxymethylene and acrylic glass. Arena flow patterns 
were simulated using EasyCFD (http://www.easycfd.net) incorporating measured 
physical and flow parameter values (Extended Data Fig. 1d). 

Behavioural imaging and tracking. For low-resolution behavioural imaging, we 
used Fluorescence Behavioural Imaging (FBI)”* acquisition software and hard- 
ware. In all cases, we used Ctrax”’ for fly tracking and data analysis was performed 
using custom Matlab scripts (The Mathworks, Natick, Massachusetts, USA). 
Behavioural experiments. All experiments were performed on adult female 
Drosophila raised at 25°C on a 12h light:12h dark cycle 2-4 days post-eclosion, 
with the exception of experiments in Extended Data Fig. 1b, which used male flies). 
Experiments were performed in a temperature-controlled room at 25 °C, except 
for those in Fig. 3d,h, which were performed at 22°C. In all cases except for 
aggregation measurements, flies were starved in empty 50mm Petri dishes for 
3-6 h in humidified 25 °C incubators. Experiments were performed in either the 
morning or late afternoon, Zeitgeber time. 
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Aggregation measurements (Extended Data Fig. 1). Flies were starved for 24h 
at 25 °C in tubes humidified with moist Kimwipes. Ripe banana paste was pre- 
pared on the day of experiments and placed into a 12.78 cm” dish. Experiments 
were performed in either the morning or late afternoon Zeitgeber time. Experi- 
ments for summary data (Extended Data Fig. 1b) were filmed with a webcam 
(Microsoft LifeCam Studio, Redmond, USA) for 90 min with images acquired 
every 10 min. Flies were placed into a clean transparent box (23.5cm X 25cm 
X 37.5cm) with only red light illumination. To calculate densities, the number 
of flies on the food source was calculated for each image and averaged from the 
30th to 90th minute. 

Collective odour avoidance - wild-type, neural knockdown, nompC, and 
mixed wild-type/anosmic (Fig. 1 and Fig. 4a-ce). For olfactory stimulation, pre- 
mixed 5% CO,, or air (Messer Schweiz AG, Lenzburg, Switzerland) was flowed 
through Mass Flow controllers (PKM SA, Lyss, Switzerland) at a regulated flow 
rate of 500 ml min“! via computer controlled solenoid valves (The Lee Company, 
Westbrook, CT, USA). A custom-fabricated circuit board and software” (sQuid, 
http://lis.epfl.ch/squid/) controlled valves, illumination LEDs (Super Bright LEDs 
Inc. St Louis Missouri, USA), and acquisition cameras (Allied Vision Technologies, 
Stadtroda, Germany). Flies were imaged in the olfactory arena using the following 
illumination/olfactory stimulation protocol: (1) infrared/blue light; air both sides 
(10s); (2) infrared light, 5% CO>/air (2 min); (3) infrared/blue light; air both sides 
(10s). 

The arena half with CO, was varied across experiments to eliminate the effects 
of other possible environmental asymmetries on behavioural results. Blue light 
was used in all cases to keep experiments consistent with mixed genotype FBI 
collective behaviour experiments (Fig. 4e). 

Collective negative gravitaxis (Extended Data Fig. 7). For negative gravitaxis 
experiments, we tilted the behavioural arena at a 22.5° incline for 2 min. Flies were 
placed near the lower portion of the arena and were illuminated with red light. The 
Negative Gravitaxis Index was calculated by averaging their position along the 
long axis of the arena (with values ranging from 0 (bottom of the arena) to 100 
(highest point of the arena)) during the second minute of the experiment. 
Encounter Response modality screen (Fig. 3a). 12 wild-type flies (‘light’) or 
mixtures of 6 wild-type and 6 mutant flies (using a GFP reporter and Fluorescence 
Behavioral Imaging to distinguish genotypes”) were imaged in the olfactory arena 
using the following illumination/odorant protocol: (1) infrared/blue light; air both 
sides (10s); (2) infrared light, air both sides (5 min); (3) infrared/blue light, air both 
sides (10s). 

Blue light was used in all cases to keep experiments consistent with mixed 
genotype Encounter Response experiments. 

High-resolution inter-fly touch response (Fig. 3b, c and Extended Data Fig. 
3d, e). Four flies were imaged in a small arena (1 cm X 5 cm) backlit with infrared 
light (Super Bright LEDs Inc. St Louis Missouri, USA). Images were continuously 
acquired at 125 frames per second (fps) using a high-speed video camera (Fastec 
Imaging, San Diego, CA, USA). The experimenter captured a video if a stationary 
fly exhibited touch-elicited walking. 

High-resolution mechanical touch response (Fig. 3d and Extended Data Fig. 
3f, g). Individual flies were imaged in a small arena (3 cm X 3 cm) illuminated bya 
red ring light (FALCON Illumination MV, Offenau, Germany). Images were con- 
tinuously buffered at 20 fps using a high-resolution video camera (Gloor Instru- 
ments, Uster Switzerland). A small magnetic metallic disc (1 mm diameter) was 
directed to individual leg or wing appendages using a larger permanent magnet. 
The experimenter captured a video ifa stationary fly exhibited touch-elicited walking. 
Neural silencing Encounter Response screen (Extended Data Fig. 4a, b). 18 
flies expressing inactive Tnt, or Tnt under the control of a specific Gal4 driver were 
imaged in the group arena using the following illumination protocol: (1) infrared/ 
blue light (10 s); (2) infrared light (2 min); (3) infrared/blue light (30 s); (4) infrared 
light (2 min); (5) infrared/blue light (10s). 

Neural silencing Encounter Response frequency (Fig. 3g and Extended Data 
Fig. 5c). 6 flies expressing UAS-Tnt/+,<driver>-Gal4/+, UAS-Tnt/<driver>- 
Gal4, or UAS-Tnt"”/<driver>-Gal4 were imaged in the presence of 6 wild-type 
flies in the group arena using the following illumination protocol: (1) infrared/blue 
light; air both sides (10 s); (2) infrared light, air both sides (2 min); (3) infrared/blue 
light, air both sides (105). 

Optogenetic stimulation (Fig. 3h and Extended Data Fig. 5d). Flies bearing 
UAS-ChR2 (T159C) and the specified Gal4 driver were raised either in food mixed 
with 2 mM all trans-Retinal (“ATR’, Sigma-Aldrich, St Louis USA) or in the 95% 
ethanol solvent. Individual flies (2-4 days post-eclosion) were imaged in a small 
arena (3 cm X 3 cm) illuminated by a red ring light (FALCON Illumination MV, 
Offenau, Germany). Images were continuously buffered at 20 fps using a high- 
resolution video camera (Gloor Instruments, Uster Switzerland). An optically 
coupled red laser (Thorlabs, Newton, USA) was aligned to target the fly’s thoracic 
segment. Stimulation consisted of a short (1s) pulse of blue laser light (Coherent, 
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Santa Clara, USA). The experimenter video recorded up to three stimulations per 
fly at a spacing of approximately 2 min; scored responses were observed at least twice. 
Behavioural analysis. To determine threshold values for fly motion, Encounters, 
and Encounter Responses, we measured velocities, accelerations and distances that 
could conservatively account for a test data set of manually annotated events. To 
the best of our knowledge our results are qualitatively robust to small variations in 
these values. 

Percent of time avoiding odour (Fig. 1d, e, Fig. 2g, h, Fig. 4, and Extended Data 
Fig. 2e-h). To calculate odour avoidance, we measured the per cent of time that 
flies spent in the non-odour (air) zone during the experiment’s second minute. 
This time period was chosen since we observed that flies tend to reduce exploration 
after one minute; see Fig. 1f. “% of time avoiding odour’ = (time spent in the odour 
zone during the last minute/1 min) X 100. To report quantitatively equivalent 
values for experiments with different densities of flies, we resampled data using 
bootstrapping. This entailed randomly selecting a subgroup of experiments and, 
from these, one fly per experiment. We then averaged the odour avoidance for 
these flies to yield one result. We repeated this process a specified number of 
iterations to generate a distribution from which to report the mean and s.d. The 
number of iterations was closely linked to the average number of experiments. For 
example, in cases where n ~ 40, the number of bootstrapping iterations was 40. 
Walking bouts (Fig. 2e and Extended Data Fig. 2b). We measured activity bouts 
using a hysteresis threshold on forward velocity (Extended Data Fig. 2b) or velocity 
magnitude (Fig. 2e) to create a binary time-series. Bouts began when velocity 
exceeded a high threshold of 1mms |. Bouts ended when velocity was below 
a low threshold of 0.5mms '. Short bouts or pauses (<2 frames or 100 ms, see 
Extended Data Fig. 2b; <20 frames or 1s, see Fig. 2e) were removed by merging 
the fly’s state with neighbouring measurements. Bouts were also terminated when 
moving flies encountered obstacles including other flies. This can explain the 
decreasing Encounter induced bout lengths observed at higher densities (Fig. 2e). 
Coherent motion index (Extended Data Fig. 1g, h). To measure the coherence 
of group motion away or towards the odour zone, we calculated a coherent motion 
index (CMI). We did this by first identifying walking flies at every time-point. For 
these flies, we identified the orientation of walking in a binary fashion: within the 
half-circle pointing towards the odour half of the arena or within the half-circle 
pointing towards the air half of the arena. The CMI for each time-point is: (no. of 
flies moving towards the air - no. of flies moving towards the odour)/total no. of 
moving flies. 

For a given experimental replicate, we average the CMI for the first ten seconds 
of odour presentation to capture the initial avoidance response. We report the 
distribution of this time-averaged CMI value across experimental replicates. For 
our analysis we examined the CMI for flies starting either in the air zone or the 
odour zone. Since the number of flies in a replicate can affect possible CMI values, 
comparisons should be limited to experiments with the same density of flies. 
Encounter likelihood/frequency of Encounters (Fig. 2b-d). To calculate Encounter 
likelihood with respect to odour walking responses (Fig. 2b, c), we identified odour 
reactions as the time at which a stationary fly within the odour zone began moving 
(velocity magnitude >1 mms_ 1) Asa control, a random time was selected from 
the entire experiment. We then determined the times at which each fly was under- 
going an Encounter (distance to nearest neighbour <25% long-axis body length). 
Using these two data sets, we performed an event-triggered average of the En- 
counter time-series for all flies. 

Notably, the timing of the peak in Encounter likelihood is not of sufficient 
resolution to make inferences about causality. This is due to the inability to pre- 
cisely define a touch encounter in low-resolution video for which the legs are not 
visible. With Encounters, we instead rely on an estimate based on the overlap 
between two circles defining the peripheral space of neighbouring flies. Therefore 
Encounters can continue past the onset of motion since neighbouring flies may not 
have become distant enough to terminate the Encounter. This is illustrated in 
Fig. 2b in which the Encounters (white blocks) persist past the times of ‘odour 
walking response’ (blue arrowheads) for both flies shown. 

To calculate the frequency of Encounters for different group densities (Fig. 2d), 
we measured the proportion of time flies spent having Encounters during a given 
experiment. Notably, Encounters are a function of motion: flies that move are 
more likely to Encounter other flies. 

Encounter Response frequency (Fig. 3a,e,g, Extended Data Fig. 4a, and Ex- 
tended Data Fig. 5c). To calculate the Frequency of Encounter Responses, for 
each stationary fly (velocity magnitude <1 mm s_') undergoing an Encounter (dis- 
tance to nearest neighbour <25% long-axis body length), we identified motion 
events (velocity >1mms' or angular velocity >2rads~' or acceleration mag- 
nitude >15 mms ”). Ifthere was continuous motion for the next half-second (mean 
velocity magnitude >5mms_') an Encounter Response occurred, otherwise not. 
The average frequency across all flies in a given experiment was used to calculate 
summary data. Notably, the Encounter Response frequency is normalized by the 


number of Encounters: Frequency of Encounter Responses = Encounters pro- 
ducing walking reaction/(Encounters producing walking reaction + Encounters 
eliciting no reaction). Therefore this frequency is not a function of motion. For 
example, flies with high walking probabilities may generate more Encounters but 
reactions to these interactions—Encounter Responses—may still be more or less 
frequent. Similarly, flies that are predominantly stationary may have few Encounters 
but these too may result in a high or low frequency of Encounter Responses. 
Encounter Response trajectories and kinematics (Extended Data Fig. 3b, c). 
To calculate Encounter Response trajectories (Extended Data Fig. 3b), for each 
stationary fly (velocity magnitude <1 mms‘ and angular velocity <2 rads” ') 
undergoing an Encounter (distance to nearest neighbour <25% long-axis body 
length) near the centre of the arena (distance to wall >2 mm), we identified motion 
events (angular velocity =>2 mms‘ ' or acceleration magnitude =15 mms 7). The 
position of the fly was recorded for the remaining frames until it stopped (velocity 
<1mms_') orbecame close to a new fly (distance to nearest neighbour <25% long- 
axis body length) or to a wall (distance to wall <2 mm). Resulting response trajec- 
tories were pooled across experiments as a function of the octant of the Encounter 
(Extended Data Fig. 3a; the appropriate octant was identified as the region sur- 
rounding the fly that was bisected by a straight line between the fly’s centre of mass 
and that of the neighbouring fly). Encounter Response velocities were obtained for 
each of these trajectories and averaged to produce kinematic data. Boxplots were 
calculated by averaging over the first 500 ms of kinematic data (Extended Data Fig. 3c). 
Touch response trajectories and kinematics (Fig. 3c, d, Extended Data Figs 
3d-g). Trajectories were taken from raw tracking data of flies responding to touch. 
Trajectories ended when flies were near another fly or a wall. Each resulting 
response trajectory was pooled across experiments depending on the location of 
touch (for example, leg or wing). Touch response velocities were also obtained for 
each of these responses and averaged to recover kinematics. Boxplots were calcu- 
lated by averaging over the first 160ms (Extended Data Fig. 3e) or 500ms 
(Extended Data Fig. 3g) of kinematic data. This discrepancy is due to the difference 
in frame-rate between the two measurements. 

Comparing response kinematics. Our aim was to compare the shape of kin- 
ematic data across Encounter responses, interfly touch responses, and mechanical 
touch responses. However, these data could be quite distinct with regards to spatial 
and temporal resolution. Therefore, we first concatenated the median value from 
each of the common seven octants (excluding the front octant in the Encounter 
response data set) across each of three velocity measures (forwards, sideways, and 
angular velocities) yielding a vector with 21 data-points. We then normalized these 
21 element vectors to range from 0 to 1. These vectors were then compared using 
the 2-sample Kolmogorov-Smirnov test. 

Simulations 

Simulated flies. To verify our model of collective odour avoidance we used an 
agent-based simulation driven by probabilistic behaviours (Fig. 2fh, Extended 
Data Fig. 2). The artificial flies had a circular body of 2.5 mm diameter and were 
placed in the arena of size 80 mm X 20 mm for 2,400 time-steps (corresponding to 
120s of ‘simulated’ time). The odour was presented on one half of the arena during 
the entire simulation. Simulated flies walked with a constant speed of 0.51 mm per 
time-step in straight bouts, which were separated by periods of inactivity. At the 
beginning of each bout or when encountering an obstacle (a wall or another fly) 
each fly randomly changed its walking direction. The bouts were initiated either 
spontaneously in isolation or during an Encounter. 

Isolated bouts (Extended Data Fig. 2a, b). To estimate the propensities of flies to 
initiate walking in isolation, we performed 45 additional single fly experiments 
(density = 0.06) in which individual animals walked in the dark for 2 min. Flies 
were exposed to air throughout the entire arena in the first minute and odour 
during the second minute. For each fly i (i = 1,2, ... 45), we integrated the differ- 
ences between its consecutive positions during the first minute of the experiment 
(air) and separately during the second minute of the experiment (odour) at 20 Hz. 
The minimum of these 45 X 2 values (that is, 29.9 mm) was treated as accumulated 
noise and subtracted from all 90 values. Consequently, we obtained the total dis- 
tance travelled in air and in odour by each of the 45 flies (that is, Dair and Dogour’ 
for i= 1, 2, ... 45). To estimate bout durations, we rescaled these 90 values such 
that their mean was equal to the mean duration of Isolated bouts observed in the 
‘six flies’ experiments (density = 0.38). Overall, we obtained 45 values of prototyp- 
ical Isolated bout lengths initiated spontaneously in air and 45 values of prototyp- 
ical Isolated bout lengths initiated in odour (that is, L Ain. = 0.29D ix’ and Logour, = 
0.29Dodour' for i = 1, 2, ... 45). Of note, the estimated bout lengths varied between 
animals, and between air and odour for a single animal. 

We used the 45 values of Laj,/ and the 45 values of Lodour to bootstrap the 
behaviour of simulated flies. A simulated fly performed a self-induced bout of 
length Lai,’ if initiated in air, and of length Logour if initiated in odour, where s 
denoted which prototypical behaviour the simulated fly used. The value of s was set 
at time-step 1 for each simulated fly independently and uniformly at random to an 
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integer value between 1 and 45. Thus, the values of s varied between the flies but it 
was possible for some flies to have the same s value. The values of s were kept con- 
stant for each simulated fly during all 2,400 time-steps. Consequently, each simu- 
lated fly had a fixed propensity to move spontaneously. Moreover, within the same 
group, simulated flies usually differed in their propensity to move spontaneously. 

Between bouts, a simulated fly following the prototypical behaviour s remained 
inactive for (Lysax — L iy’ )/v time-steps when resting in air and for (Lyjax - Lodour )/ 
v time-steps when resting in odour, where Lyax is the maximal value of all 90 
values of L (that is, Lyjax = 214mm) and v is the walking speed. We estimated the 
walking speed vas the maximum of (Dajir' + Dodour) over all i = 1,2, ... 45 divided 
by 120s, which resulted in v= 10.2 mms! = 0.51 mm per time-step. 

Crossing the air-odour interface (Extended Data Fig. 2c). A simulated fly 
changed its direction of motion by 180 degrees when crossing from air to odour 
with probability P(turn away from odour) = 0.4, and when crossing from odour to 
air with probability P(turn away from air) = 0.2. The values of P(turn away from 
odour) and P(turn away from air) were estimated from 40 single fly (density = 
0.06) experiments taken from Fig. 1d in which animals walked freely in the dark 
for 2 min with odour exposure on one half of the arena. We calculated the time flies 
spent in the odour after crossing from air, and vice versa. We classified a crossing 
from one half of the arena to another as a ‘turn around if the time spent in the new 
half was = 3 s. Overall, we observed 76 crossings from air to odour out of which 31 
were classified as a ‘turn around’, and 72 crossings from odour to air out of which 
16 were classified as a ‘turn around’. 

Encounter-induced bouts (Extended Data Fig. 2d). In the simulation, at each 
time step t and for each walking fly we detected if the fly encountered an obstacle. If 
so, we checked whether in time-step t + 1 the fly’s body would overlap with a wall 
or with other flies’ bodies (assuming the fly would walk for 0.51 mm in the same 
direction it was heading). In these cases, the walking fly did not move in time-step 
t, but randomly changed its direction, and resumed the walk in time-step t+ 1. 

Moreover, if the walking fly encountered an inactive fly, it caused the encoun- 
tered fly to initiate a bout with probability P(Encounter Response) = 0.8 (from 
Fig. 3a). The length of this Encounter Response bout was equal to E(Lair") and to 
E(Lodour’) when initiated in air and in odour, respectively. The value u is a random 
integer between 1 and 45, and E is a mapping from the lengths of Isolated bouts to 
the lengths of Encounter Response bouts. Note that in contrast to the lengths of 
Isolated bouts (that is, L,;,° and L,j,°) where the value s was fixed for each fly at the 
beginning of a simulation, here u was a random variable redrawn independently 
for each Encounter Response. Consequently, simulated flies did not vary in their 
propensity to move due to Encounters. 

We did not explicitly encode directionality in the Encounter Response angle. 
However, we observed that since virtual flies cannot occupy the same space, station- 
ary flies would move on average away from the location of touch, an implicitly 
directional response. 

We estimated E using the data from the six fly experiments (density = 0.38) in 
which animals walked freely in green light illumination for 5 min without odour 
exposure. We observed 1,314 Isolated bouts and 618 Encounter-induced bouts. For 
all 1,932 bouts we calculated their lengths by integrating with temporal resolution of 
20 Hz the differences between the consecutive positions of a given fly. Next, for both 
types of bouts, we calculated the Oth, Ist, 2nd, ... 100th percentiles of their lengths, 
created a scatter plot and calculated a double linear mapping from the lengths of 
Isolated bouts to the lengths of Encounter Response bouts, which fit the data best 
(that is, E(x) = 4.71x + 0.75 when x < 20 and E(x) = 1.04x + 70.69 otherwise). 
Experiments and sensitivity analyses (Fig. 2h and Extended Data Fig. 2e-h). 
Overall, there were six experiments. We performed one main experiment to test 
the collective behaviour of flies in two conditions: (1) all flies in the group were 
odour-sensitive (Fig. 2h) and (2) the first fly from the group was odour-insensitive 
(Fig. 4d). To simulate an odour-insensitive fly s we used L’odour = Lair in place of 
Lodour Values. Additionally, we performed five experiments, each corresponding 
to a different sensitivity analysis. For each of these experiments there were eleven 
conditions, each corresponding to a different value of the investigated parameter. 

In the first experiment we varied the propensity to move due to Encounters by 
setting P(Encounter Response) between 0 and 1 with a step size of 0.1 (Fig. 2h). In 
the second experiment we varied the propensity to move in air (Extended Data Fig. 
2e). To this end, we used L’,;,° = dail ais in place of Laj,° value, and we set the 
damping coefficient a,;, from 0 to 1 with a step size of 0.1. In the third experiment 
we varied the propensity to move in odour (Extended Data Fig. 2f). To this end, we 
used L’odour = GodourLodour in place of Lodour Value, and we set the damping 
coefficient dodour from 0 to 1 with a step size of 0.1. In the fourth experiment we 
varied the probability to turn back when crossing the interface from odour to air by 
setting P(turn away from air) between 0 and 1 with a step size of 0.1 (Extended 
Data Fig. 2g). In the fifth experiment we varied the probability to turn back when 
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crossing the interface from air to odour by setting P(turn away from odour) 
between 0 and 1 with a step size of 0.1 (Extended Data Fig. 2h). 

Odour avoidance. In both conditions of the main experiment, and for each pair of 
5 sensitivity experiments and 11 conditions, we ran simulations with 9 different 
group sizes. We used groups composed of n = 1, 3, 6, 9, 12, 15, 18, 21, and 24 
simulated flies (not all data reported). Overall, there were 1 X 2 X 9 (main) and 
5 X 11 X 9 (sensitivity) lines of experiments. Each experimental line was replicated 
22,000 times using a Mersenne Twister pseudo-random numbers generator** with 
aseed set to 1, 2, ... 22,000, respectively. The initial positions, initial directions and 
the prototypical behaviours of simulated flies were identical between correspond- 
ing replicates across different experimental lines. 

The odour avoidance of a simulated fly was calculated as the proportion of time- 
steps the fly spent in air during time-steps 1,200 to 2,400 corresponding to the 
second minute of the experiment. To compare simulations’ outcomes between 
treatments, conditions and group sizes, we averaged in each experimental line the 
odour avoidance of the first simulated fly across all replicates in which the fly was 
initially placed in odour (there were 10,902 such replicates out of all 22,000 
replicates). Note that we chose to compare experimental lines based on the first 
simulated fly because it was the only fly used in all experimental lines. For example, 
the second and the third simulated flies were present in all experimental lines with 
groups composed of 3 or more flies, but were not present when the group was 
composed of just one fly. 

For more details see the simulation’s implementation in Java available on-line at: 

http://documents.epfl.ch/users/r/ra/ramdya/www/ramdya/collective_sim.html. 
Anatomical imaging 
Brain/thoracic ganglia staining and imaging (Extended Data Fig. 5a). Immuno- 
fluorescence on whole-mount brains and thoracic ganglia was performed as de- 
scribed previously’. The primary antibodies were mouse monoclonal nc82 (1:10 
dilution; Developmental Studies Hybridoma Bank), rabbit anti-GFP (1:200, Invitrogen 
A-6455). The secondary antibodies were Alexa Fluor 488- and Cy3- conjugated 
goat anti-rabbit or anti-mouse IgG, respectively (Molecular Probes and Jackson 
ImmunoResearch) diluted 1:250. Microscopy was performed using an LSM 510 
laser scanning confocal microscope (Zeiss). 
Leg neuron imaging (Fig. 3f and Extended Data Figs 4c, d and 5b). Legs were 
removed from female adults 2 days post-eclosion and mounted in VectaShield 
under a coverslip. Cuticle was imaged with a 543 nm laser while CD4:tdGFP was 
imaged using a 488 nm laser. Microscopy was performed using an LSM 510 laser 
scanning confocal microscope (Zeiss). We reoriented leg images using a custom 
script to identify and crop the femur, tibia, and tarsal segments. Using these sub- 
images, we then quantified fluorescence values (excluding autofluorescence from 
the cuticle and surface debris) orthogonal to the long axis of each leg segment to 
produce a profile of leg mechanosensory structures. Chordotonal organs and 
mechanosensory sensilla neurons were distinguished by morphology: sensilla 
neurons had small somata with dendrites projecting to the base of leg sensilla 
(Extended Data Fig. 4d). 
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Extended Data Figure 1 | Drosophilaaggregate and move coherently at high 
densities. a, Images at the start (left) and end (right) of a ~3 h video recording 
with 100 flies (50 male and 50 female) moving within a large container 
containing a banana paste dish (left) and an agarose dish (right). b, Fly densities 
on the banana paste dish for each gender or mixture of genders averaged 
from the 30th through 60th minute of a 90 min experiment (n = 4 experiments 
for each genotype). c, The arena for simultaneous odour stimulation and 
behaviour tracking of Drosophila groups. d, Laminar flow and odour 
localization validation using simulated fluid dynamics. High velocity vectors 
(yellow/red) are present at the odour entry and exit ports while lower, uniform 
velocity vectors (green/blue) are located within the arena. e, A histogram 
showing the per cent of time avoiding the odour for all flies in all experiments 
and for each density (colour-coded). Data are from Fig. 1d. f, The per cent 
time avoiding the odour (mean and s.d.) for five different densities of the subset 


of flies starting in the odour zone that have at some point entered the air zone 
(n = 37, 38, 36, 35, and 38 experiments for 0.06, 0.38, 0.75, 1.13, and 1.5 flies 
per cm’ respectively). In contrast to Fig. 1d, the lack of density dependence 
suggests that flies that leave the odour zone tend not to return. g, The formula 
for a Coherent Motion Index that captures the degree of motion in the same 
direction (top) and an example of coherent motion away from the odour 
zone by 9 out of 11 flies total (bottom, cyan). h, The Coherent Motion Index for 
flies in the air (white boxes) or odour (grey boxes) zones during the ten 
seconds following odour onset. Data are from Fig. 1d. Shown are the results 
across all tested densities (0.06-1.5 flies per cm”) for flies that began the 
experiment in the odour (grey boxes) or the air zone (white boxes). n = 31-38 
experiments. A single asterisk (*) denotes P< 0.05 and a double asterisk (**) 
denotes P< 0.01 for a Bonferroni sign test comparing medians to 0. 
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Extended Data Figure 2 | Model parameter determination and the 
sensitivity of simulated collective behaviour to parameter variation. 

a, b, Individual freely walking flies were presented with 5% CO, (‘odour’) or air 
across the entire arena for 1 min. Mean (solid line) and s.e.m. (translucent 
shading) walking velocity magnitude (a) and forward bout probability 

(b) before, during, and after an odour impulse (black, n = 45 flies) or an air 
impulse control (blue, n = 43 flies). Bouts began when velocity exceeded a high 
threshold of 1 mms. Bouts ended when velocity dipped below a low 
threshold of 0.5mms_ '. Short bouts or pauses (<2 frames or 100 ms) were 
removed by merging the fly’s current behavioural state with neighbouring 
measurements. Grey indicates the period of odour presentation. c, Probability 
for Drosophila to turn back when crossing the interface from odour to air 
and vice versa after a given period of time. Data are from Fig. 1d 

(density = 0.06). d, Scatter plots of Drosophila bout lengths during isolation 
versus Encounter Response bout lengths (red dots) and the double-linear 
function fitting the data (blue line). n = 16 experiments at density = 0.38 flies 


per cm’. The graph on the right is a zoom-in of that on the left (dashed box). 
e-h, Sensitivity of simulated collective behaviour to P(bout,;,) ranging from 
Probability = 0 (blue, never initiating spontaneous walking in air) to 
Probability = 1 (yellow, always initiating spontaneous walking in air) 

(e), P(bout.dour) ranging from Probability = 0 (blue, never initiating 
spontaneous walking in odour) to Probability = 1 (yellow, always initiating 
spontaneous walking in odour) (f), P(turn around from air) ranging from 
Probability = 0 (never turning around from the air zone, blue) to 

Probability = 1 (always turning around from the air zone, yellow) (g), P(turn 
away from odour) ranging from Probability = 0 (never turning around from 
the odour zone, blue) to Probability = 1 (always turning around from the 
odour zone, yellow) (h). In all panels, each coloured line indicates the mean per 
cent time avoiding the odour across densities, the black line indicates the 
simulation result for parameter values taken from real fly data, n = 10,902 for 
all data-points, and superimposed are the mean values for real flies (black 
circles). 
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Extended Data Figure 3 | Encounter Response kinematics for inter-fly or 
metallic disc touches. a, Schematic of octant colour-coding. Each Encounter 
Response trajectory is assigned to the perimeter octant bisected by a line 
drawn to the nearest neighbouring fly during an Encounter. A head octant (red) 
is included here but these responses likely represent front leg touches. b, The 
mean (solid lines) and standard error (translucent areas) for Encounter 
Response trajectories (right) colour-coded by the relative location of the 
neighbouring fly as in panel a. The scale bar is 1 mm. ¢, Boxplot of mean 
forward (top), sideways (middle), and angular (bottom) velocities for the first 
0.5s of Encounter Responses (n = 112-244 Encounters with duration >0.5 s) 
in the olfactory avoidance experiment from Fig. 1d (density = 0.75 flies per 


cm’). Velocities are colour-coded by octant. d, Schematic of touch-point 
colour-coding for high-resolution inter-fly touch response experiments. Each 
walking trajectory is colour-coded by the appendage touched by a neighbouring 
fly. Data are from Fig. 3c. e, Boxplot of mean forward (top), sideways 
(middle), and angular (bottom) velocities for the first 0.16 s of touch responses. 
Velocities are colour-coded by touch-point. f, Schematic of touch-point colour- 
coding for mechanical touch response experiments. Each touch response 
trajectory is assigned to the appendage touched by a metallic disc. Data are from 
Fig. 3d. g, Boxplot of mean forward (top), sideways (middle), and angular 
(bottom) velocities for the first 0.5 s of touch responses. Velocities are colour- 
coded by touch-point. 
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Extended Data Figure 4 | A behavioural screen for neurons mediating 
Encounter Responses and their leg expression patterns. a, Frequency of 
Encounter Responses for each Gal4 driver expressing UAS-Tnt. Driver lines are 
sorted by median frequency of Encounter Responses. A single asterisk (*) 
indicates P < 0.05 for a Bonferroni-corrected Mann-Whitney U-test 
comparing a given line against a gustatory neuron expression line, R27B07- 
Gal4 (green). Density = 1.13 flies per cm? and n = 10 experiments for each line. 
The selected line, R55B01-Gal4, drives expression in distal leg mechanosensory 
neurons (cyan). b, The fraction of flies in each experiment exhibiting walking 
velocities that meet the criteria for Encounter Responses (mean velocity 
magnitude greater than 5mm’! for more than 0.5s) at any time during the 
experiment. Lines are sorted and colour-coded as in panel a. c, The identity and 
leg expression patterns of Gal4 drivers tested in the screen. Black boxes denote 
the presence of a given cell class. A cyan outline indicates distal leg 
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mechanosensory neuron expression. A red outline indicates thoracic ganglion 
expression in lines with significant reductions in Encounter Response 
frequency. The expression pattern is also shown for piezo-Gal4, which was 
used in subsequent experiments to refine identification of the leg 
mechanosensory neuron class required for Encounter Responses. d, Tarsal 
segments for w; UAS-CD4:tdGFP;R55B01-Gal4 (left) and w;UAS- 
CD4:tdGFP;piezo-Gal4 (right) flies. Each tarsal segment is labelled from 
proximal to distal (T1-T5). Endogenous GFP fluorescence (green) is 
superimposed upon a transmitted light image (magenta). The scale bars are 
30 um. Below is a high-resolution image of a mechanosensory sensilla neuron 
on the tarsus of a w; UAS-CD4:tdGFP;R55B01-Gal4 fly. Endogenous GFP 
fluorescence (green) is superimposed on cuticular autofluorescence (magenta). 
The axon, cell body, and dendrite of this neuron are labelled. The scale bar is 
10 um. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


R55B01-Gal4 piezo-Gal4 


piezo-Gal4, cha3-Gal80 R86D09-Gal4 


R65C03-Gal4 


Fan-shaped 
body 


R46H11-Gal4 


Thoracic 
ganglion 
b mr 1 paeaias 
J 
Ss 
Faniur E fChO fChO 
Tibia L 
S 
2 
= ‘ tCho 
Tarsus 
o 
Ss 
2 { 
o 
= | L_| , tL] 
c 7 1 ns 1 >j"s 1 =j"s 
Stationary 53s a 
fy «Bg a Se Hee 4 
- * 3 38 o a 
ues 7 i 0 
Moving Tnt [+] z = 
fly Gal4 + 
ri Inactive Tnt [+] | + + 
Optogenetic 


stimulation response 


walk oO 
groom |) 


Laser 
leg shift 


none 


jump 


all trans-Retinal 


Extended Data Figure 5 | Leg mechanosensory sensilla neurons, but not 
chordotonal organs, are necessary and sufficient for Encounter Responses. 
We identified five lines expressing Gal4 in different subsets of mechanosensory 
neurons (R55B01-Gal4, piezo-Gal4, piezo-Gal4;cha3-Gal80, R86D09-Gal4, 
and R46H11-Gal4) and one line expressing Gal4 in the fan-shaped body 
(R65C03-Gal4) as a control for fan-shaped body expression in R55B01-Gal4. 
a, Brain and thoracic ganglion expression for Gal4 lines driving UAS- 
CD4:tdGFP. Immunostaining is shown for the neuropil marker nc82 (magenta) 
and CD4:tdGFP (green). Sensory neuron projections from the wings (“W’) and 
legs (RI-R3 and L1-L3) are labelled for R55B01-Gal4. Importantly, neurons 
expressing GFP in the brains of R55B01-Gal4 and piezo-Gal4; cha3-Gal80 flies 
are different, implying that they are not responsible for the production of 
Encounter Responses. The scale bars are 40 um. b, Transmitted light images, 


+i 


- + 


inverted GFP fluorescence images (GFP indicated in black), and summed 
fluorescence of Gal4 driver legs expressing CD4:tdGFP. Autofluorescent cuticle 
and pretarsus debris are indicated in black. GFP expression is shown in green. 
When present, the femoral chordotonal organ (‘f{ChO’), tibial chordotonal 
organ (‘tChO’) and mechanosensory sensilla neurons (‘MS’) are labelled. The 
scale bar is 100 jum. c, The frequency of Encounter Responses for a parental 
control (‘Gal4’), Gal4 line neurons expressing an inactive tetanus toxin control 
(‘Gal4# and ‘Inactive Tnt’), or Gal4 line neurons expressing tetanus toxin (‘Gal4’ 
and “Tnt’). n = 10-15 experiments for each condition. d, Blue laser pulse 
stimulation responses of Gal4 line flies expressing UAS-ChR2 in the absence 
(left) or presence (right) of the essential cofactor all trans-Retinal (n = 6-12 
flies for each condition). Each box indicates the response for a single fly (‘walk’, 
‘groom’, ‘leg shift’, ‘none’, or jump’). 
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Extended Data Figure 6 | Schematic of collective odour avoidance in 
Drosophila. a, A group of flies experiences odour flow on the right half of the 
arena. The direction of odour or air flow is indicated by red and black arrows, 
respectively. Odour increases the probability of spontaneous walking (black 
fly). b, Walking increases the probability of encountering a stationary fly, 
producing an Encounter Response. c, Walking flies cause additional 
Encounters and a cascade of Encounter Responses in the odour zone. 

d, Walking flies pass into the non-odour zone through interactions with the 
arena walls and possibly by sensing the direction of odour flow. e, The influx of 
walking flies to the air zone results in additional Encounter Responses. f, The 
propensity to turn around at the air-odour interface (perhaps compounded by 
the effects of unknown aggregation pheromones) causes flies to remain in the 
air zone, resulting in odour avoidance. 
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Extended Data Figure 7 | Collective negative gravitaxis in Drosophila. a, A 
schematic of the negative gravitaxis experiment. Flies are placed at the lowest 
point of a behavioural arena tilted at 22.5°. The flies’ positions are normalized 
to the long-axis of the arena ranging from 0 (arena bottom, lowest elevation) 
to 100 (arena top, highest elevation). b, Image of flies (black triangles) and 
their trajectories during 1 s (black dotted lines) in the negative gravitaxis 
experiment. Shown are representative images of an experiment with one fly 
(density = 0.06 flies per cm”) and an experiment with 18 flies (density = 1.13 
flies per cm”). Negative Gravitaxis Index value positions of 0 (lowest elevation 
in the arena) and 100 (highest elevation in the arena) are shown (white-dashed 
lines). c, To obtain a Negative Gravitaxis Index for a given fly, its position was 
averaged during the second minute of the experiment. Shown are the mean and 
s.d. of Negative Gravitaxis Indices for wild-type animals at densities of either 
0.06 or 1.13 flies per cm? (n = 28 and 30 experiments, respectively). d, Images of 
two flies (left, black triangles in black dashed box) undergoing an Encounter 
(middle, red dashed box) that results in an Encounter Response (right, blue 
dashed box) during a negative gravitaxis experiment. 
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Extended Data Table 1 | P values for data in main figures 


Figure 


1d 


2d 


2e 


29 


3a 


3e 


Comparison 


6 vs. 1 fly 

12 vs. 1 fly 
18 vs. 1 fly 
24 vs. 1 fly 


12 vs. 6 flies 
18 vs. 12 flies 
24 vs. 18 flies 


6 flies, Enc. vs. Iso. 


12 flies, Enc vs. Iso. 
18 flies, Enc vs. Iso. 


6 vs. 1 fly 

12 vs. 1 fly 
18 vs. 1 fly 
24 vs. 1 fly 


Light vs. Dark 
Anosmic vs. Dark 
nanchung vs. Dark 
piezo vs. Dark 
nompC vs. Dark 


oe+ vs. 0e- 


P value 
(uncorrected) 


3.37 x 10° 
4.20 x 10° 
4.96 x 107! 
6.77 x 10°8 


1.67 x 10" 
3.91 x 10" 
4.50 x 10" 


3.18 x 10% 
3.78 x 10% 
1.30 x 10° 


3.69 x 10% 
1.87 x 10°” 
4.99 x 10°'9 
2.27 x 10° 


1.73 x 10? 
4.27 x 10" 
1.73 x 107 
1.13 x 107 
1.00 x 10° 


5.55 x 107 


Number of 
Comparisons 
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Figure 


3g 


4a 


4b 


4c 


4d 


4e 


Comparison 


Tnt vs. Gal4>Tnt 
Gal4 vs. Gal4>Tnt 


Inactive Tnt vs. Gal4>Tnt 


Inactive Tnt, 18 vs. 1 fly 


Tnt, 18 vs. 1 fly 


Inactive Tnt, 18 vs. 1 fly 


Tnt, 18 vs. 1 fly 


nompC*", 18 vs. 1 fly 
nompC~, 18 vs. 1 fly 


3 vs. 1 fly 
6 vs. 1 fly 
12 vs. 1 fly 


3 vs. 1 fly 
6 vs. 1 fly 
12 vs. 1 fly 


P value 
(uncorrected) 


1.28 x 10% 
2.74 x 10% 
1.10 x 10° 


2.99 x 10% 
5.54 x 107 


3.90 x 10% 
3.75 x 107 


1.12 x 10° 
1.62 x 10" 


3.98 x 107 
2.98 x 10° 
3.12 x 108 


8.87 x 108 
1.76 x 10°® 
3.60 x 10° 
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Number of 
Comparisons 


3 


The uncorrected P values for each main figure panel and its associated comparison are indicated. The number of comparisons used for post-hoc Bonferroni correction for multiple comparisons is also shown. 
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Extended Data Table 2 | P values for data in Extended Data figures 


Figure Comparison P value Number of Figure Comparison P value Number of 
(uncorrected) Comparisons (uncorrected) Comparisons 
1h Air vs. Odour, 6 flies 8.90 x 107 1 5c R55B01-Gal4 2 
Air vs. Odour, 12 flies 2.90 x 103 1 Gal4 vs.Gal4>Tnt 1.10 x 10° 
Air vs. Odour, 18 flies 2.16 x 10% 1 Gal4>Inactive vs. Gal4>Tnt 1.70 x 10° 
Air vs. Odour, 24 flies 1.40 x 10° 1 : 
piezo-Gal4 2 
4a R65A11 vs. R27B07 5.01 x 10% 19 Gal4 vs.Gal4>Tnt 1.24 x 104 
R14F12 vs. R27B07 2.18 x 10% Gal4>Inactive vs. Gal4>Tnt 1.50 x 10“ 
R20C06 vs. R27B07 8.15 x 10° : 
R55B01 vs. R27B07 2.50 x 10° plegeoal t chae-eaiee 4 2 
R13E04 vs. R27B07 5.10 x 10° Sinner epee 
R93A02 vs. R27B07 5.69 x 102 Gal4>Inactive vs. Gal4>Tnt 1.10 x 10 
R39D08 vs. R27B07 2.50 x 10° R86D09-Gal4 2 
R95A11 vs. R27B07 1.26 x 102 Gal4 vs.Gal4>Tnt 2.07 x 10" 
R41A08 vs. R27B07 7.94 x 102 Gal4>Inactive vs. Gal4>Tnt 6.94 x 107 
R46H11 vs. R27B07 1.78 x 107 R46H11-Gal4 2 
R59C08 vs. R27B07 1.81 x 107 Gal4 vs.Gal4>Tnt 6.89 x 101 
R46D02 vs. R27B07 4.02 x 10? Gal4>Inactive vs. Gal4>Tnt 3.51 x 10? 
R86D09 vs. R27B07 3.25 x 10" 
R22A04 vs. R27B07 2.37 x 107 peonee Sale 5 a 
R39A11 vs. R27B07 4.18 x 102 itabatan bibs icles 
Gal4>Inactive vs. Gal4>Tnt 2.62 x 107 
R27E02 vs. R27B07 2.01 X 10" 
R93D11 vs. R27B07 2.18 x 107 Tc 48 vs. 1 fly 3.63 x10" 1 
R86G01 vs. R27B07 3.72 x 107 
R74B10 vs. R27B07 8.44 x 10" 


The uncorrected P values for each Extended Data figure panel and its associated comparison are indicated. The number of comparisons used for post-hoc Bonferroni correction for multiple comparisons is 
also shown. 
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Identification of a mast-cell-specific receptor crucial 
for pseudo-allergic drug reactions 


Benjamin D. McNeil’, Priyanka Pundir?, Sonya Meeker’, Liang Han", Bradley J. Undem®, Marianna Kulka”* & Xinzhong Dong"” 


Mast cells are primary effectors in allergic reactions, and may have 
important roles in disease by secreting histamine and various inflam- 
matory and immunomodulatory substances’”. Although they are 
classically activated by immunoglobulin (Ig)E antibodies, a unique 
property of mast cells is their antibody-independent responsiveness 
to a range of cationic substances, collectively called basic secretago- 
gues, including inflammatory peptides and drugs associated with 
allergic-type reactions’’. The pathogenic roles of these substances 
have prompted a decades-long search for their receptor(s). Here we 
report that basic secretagogues activate mouse mast cells in vitro 
and in vivo through a single receptor, Mrgprb2, the orthologue of 
the human G-protein-coupled receptor MRGPRX2. Secretagogue- 
induced histamine release, inflammation and airway contraction are 
abolished in Mrgprb2-null mutant mice. Furthermore, we show that 
most classes of US Food and Drug Administration (FDA)-approved 
peptidergic drugs associated with allergic-type injection-site reac- 
tions also activate Mrgprb2 and MRGPRX2, and that injection-site 
inflammation is absent in mutant mice. Finally, we determine that 
Mrgprb2 and MRGPRX2 are targets of many small-molecule drugs 
associated with systemic pseudo-allergic, or anaphylactoid, reactions; 
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we show that drug-induced symptoms of anaphylactoid responses 
are significantly reduced in knockout mice; and we identify a com- 
mon chemical motif in several of these molecules that may help pre- 
dict side effects of other compounds. These discoveries introduce a 
mouse model to study mast cell activation by basic secretagogues 
and identify MRGPRX2 asa potential therapeutic target to reducea 
subset of drug-induced adverse effects. 

Responsiveness to basic secretagogues is conserved among mammals* 
and is also found in birds’, indicating an ancient, fundamental role for 
its mechanism. Many basic secretagogues are endogenous peptides, 
often linked to inflammation; however, they activate connective tissue 
mast cells only at high concentrations and independent of their canon- 
ical receptors, so another mechanism of stimulation must exist®. Several 
candidate proteins that bind polycationic compounds have been pro- 
posed as basic secretagogue receptors®°. Among these, MRGPRX2 has 
been screened with the most compounds*”°, and short interfering RNA 
(siRNA) knockdown studies support at least a partial role for MRGPRX2 
in activation by four non-canonical basic secretagogues'''’. However, 
no direct in vivo study or knockout model has been employed for any 
candidate. The investigation of MRGPRX2 in mice is complicated because 


Figure 1 | Mrgprb2 is the orthologue of human 
MRGPRX2. a, Diagram of mouse and human 
Mrgpr genomic loci. Mouse Mrgpra3 and Mrgprc11 
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are orthologues of human MRGPRX1, determined 


=<" by expression and ligand specificity'’. The 


MRGPRX2 orthologue Mrgprb2 is described in 
this study. Chr., chromosome. b, Results from a 
stringent RT-PCR screen identifying Mrgprb2 
transcript (arrow) in mouse peritoneal mast cells. 
The negative control (Neg.) omitted reverse 
transcriptase. RT-PCR for Mrgprb2 was repeated 
at least four times. c, Example traces of intracellular 
calcium concentrations [Ca**];, measured by 
ratiometric Fura-2 imaging, from Mrgprb2-HEK 
or MRGPRX2-HEK cells exposed to 20 1M 
PAMP(9-20) (duration indicated by black line). 
Each trace is a response from a unique cell. 

d, Representative confocal images from BAC 
transgenic mouse tissues in which tdTomato 
expression is controlled by enhanced green 
fluorescent protein (eGFP)-Cre expression from 
the Mrgprb2 locus (see Methods). Avidin staining 
was used to identify mast cells. Percentages of 
avidin-positive mast cells that were also tdTomato 
positive: glabrous skin, 97.5%; hairy skin, 90.1%; 
trachea, 97.2%; heart, 87.1%. Percentages of 
tdTomato-positive cells that were also avidin 
positive: glabrous skin, 99.2%; hairy skin, 100%; 
trachea, 98.3%; heart, 99%. n = 3 mice and >300 
cells counted per tissue, except n = 2 and >100 
cells counted in the heart. Scale bar, 20 um. 
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the gene cluster containing the four human MRGPRX members is dra- 
matically expanded in mice, consisting of 22 potential coding genes, many 
with comparable sequence identity to MRGPRX2 (Fig. 1a). Therefore, 
a mouse MRGPRX2 orthologue must be determined by expression pat- 
tern and pharmacology. A stringent polymerase chain reaction with 
reverse transcription (RT-PCR) screen in mouse primary mast cells 
uncovered a band for a single family member, Mrgprb2 (Fig. 1b), whereas 
MRGPRX1 orthologues were not expressed at relevant levels (Extended 
Data Fig. la, b). Functionally, HEK293 cells heterologously express- 
ing mouse Mrgprb2 (Mrgprb2-HEK) responded to the MRGPRX2 
agonist proadrenomedullin amino-terminal 20 peptide, fragment 9-20 
(PAMP(9-20))'* (Fig. 1c) and compound 48/80 (48/80), a classical mast 
cell activator and canonical basic secretagogue (Extended Data Fig. 2). 
Mrgprb2-HEK cells also responded to other MRGPRX2 ligands, includ- 
ing the basic secretagogue Substance P, but had no response to the 
MRGPRX1 ligand chloroquine”*; no closely related family members in 
mice responded to any compound (Extended Data Figs 1c and 2a, c). 
To determine the expression of Mrgprb2, we generated Mrgprb2 bacte- 
rial artificial chromosome (BAC) transgenic mice in which the expres- 
sion of eGFP-Cre recombinase was under the control of the Mrgprb2 
promoter. Strikingly, Cre expression patterns indicate that Mrgprb2 
expression is highly specific to connective tissue mast cells (Fig. 1d and 
Extended Data Figs 3, 4). Together, the pharmacological and expres- 
sion data indicate that Mrgprb2 is the mouse orthologue of human 
MRGPRX2. 

Next, we determined whether Mrgprb2 is the basic secretagogue recep- 
tor in mouse mast cells. The Mrgprb2 genomic locus contains too much 
repetitive sequence to permit gene targeting through homologous recom- 
bination (Extended Data Fig. 5a). Therefore, we used a zinc-finger- 
nuclease-based strategy to generate a mouse line with a 4 base pair (bp) 
deletion in the Mrgprb2 coding region (Mrgprb2™" mice), resulting 
ina frameshift mutation and early termination shortly after the first trans- 
membrane domain (Extended Data Fig. 5b-d). The mutation was stable 
and inheritable (Extended Data Fig. 5c), so we regard Mrgprb2M0T 
as a functional null. Mast cell numbers were comparable in tissues of 
wild-type and Mrgprb2™” mice, indicating that Mrgprb2 is not essen- 
tial for mast cell survival or targeting to tissue (Extended Data Fig. 6a). 
The responsiveness of peritoneal mast cells to anti-IgE antibodies (Fig. 2a) 
and endothelin (Extended Data Fig. 7) was also comparable, demonstrat- 
ing that Mrgprb2 mutation does not globally impair IgE or G-protein- 
coupled receptor (GPCR)-mediated mast cell signalling. However, 48/ 
80-induced mast cell activation (Fig. 2a) and tissue histamine release 
were essentially abolished in mutant mast cells (Fig. 2b and Extended 
Data Fig. 6b). Furthermore, we found that 48/80-evoked tracheal con- 
traction (Fig. 2c) and hindpaw inflammation (extravasation and swelling; 
Fig. 2d) were almost completely absent inan Mrgprb2™" background, 
while antigen (Fig. 2c) and anti-IgE evoked responses (Extended Data 
Fig. 8) were comparable to wild-type mice. Finally, we found that four 
additional basic secretagogues, as well as the MRGPRX2 agonists 
PAMP(9-20) and cortistatin’®, strongly activated wild-type but not 
Mrgprb2™ mast cells (Fig. 2e and Extended Data Fig, 9a). HEK293 cells 
expressing Mrgprb2 or MRGPRX2 (MRGPRX2-HEK) also responded 
to these secretagogues (Extended Data Fig. 2). Taken together, we con- 
clude that Mrgprb2 is the mouse mast cell basic secretagogue receptor. 
It is likely that the list of small, basic peptides that activate Mrgprb2 is 
greater than the number in this study; indeed, dozens of such peptides 
have been shown to activate mast cells**’*"”. Notably, human MRGPRX2 
is much more sensitive to substance P than mouse Mrgprb2 (Extended 
Data Fig. 2c), suggesting a potential species-specific role for substance 
P in mast cell signalling. 

We next considered whether Mrgprb2 factors in allergic-type reac- 
tions. We specifically addressed drug-induced reactions because many 
therapeutic drugs are cationic. Up to 15% of drug-induced adverse reac- 
tions appear to be allergic in nature; however, many do not correlate well 
with IgE antibody titre, indicating that antibody-independent, or pseudo- 
allergic, mechanisms participate’*. We focused first on peptidergic drugs 
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Figure 2 | Mrgprb2 is the mouse mast cell basic secretagogue receptor. 

a, Left, representative Fluo-4 fluorescence heat map images of mouse peritoneal 
mast cells showing changes in [Ca**], induced by bath application of anti-IgE 
(5 ug ml *) or 48/80 (10 pg ml’). Middle, representative imaging traces. 
Each colour line represents an individual cell. Black lines in ‘anti-IgE’ panels 
are average traces for each genotype. Note that [Ca*"], traces are similar 
between wild-type (WT) and mutant (MUT) groups. Right, quantification of 
responding cells (n = 3 per genotype; >150 cells counted per condition). 
Anti-IgE responses were not significantly different. Scale bar, 10 um. 

b, Histamine release into the supernatant from trachea and abdominal skin 
from wild-type and Mrgprb2™” mice after exposure to 48/80 (30 ig ml‘) 
for 30 min at 37°C. n = 5 for trachea, n = 8 for skin. c, Top, representative 
traces showing contractions of trachea isolated from wild-type and 
Mrgprb2™“ mice (previously sensitized to ovalbumin (Ova)) in response to 
48/80 (30 Lg ml) or ovalbumin (10 pg ml ~ 1. that is, IgE dependent). Bottom, 
average data; maximum total contraction determined as response to 10 uM 
carbamycholine added at the end of the experiment. n = 5 for 48/80 wild type, 
n= 3 for 48/80 MrgprB2™™". d, Left, representative images of Evans blue 
stained extravasation 15 min after intraplantar injection of 48/80 (right, arrow, 
10 pg ml 445 ul in saline) or saline (left). Right, quantification of Evans blue 
leakage into the paw and paw thickness increase after 15 min. OD¢20 nm: optical 
density at 620 nm. *P < 0.02 (n = 5 wild type, n = 6 MrgprB2™”). Differences 
after saline injection were not significant. e, Quantification of wild-type 

and Mrgprb2™T mast cell responsiveness to MRGPRX2 ligands and basic 
secretagogues, assayed using Fluo-4 imaging. Concentrations of substances 
(in uM): PAMP(9-20), 20; cortistatin-14 (Cort.), 20; substance P (Sub P), 200; 
kallidin, 200; mastoparan (masto.; a component of wasp venom), 20; vespid 
mastoparan, 20. n = 3 per genotype; >150 cells counted per secretagogue. Data 
are presented as mean + standard error of the mean (s.e.m.). Two-tailed 
unpaired Student's t-test was used to determine significance in statistical 
comparisons, and differences were considered significant at P< 0.05. 

*P < 0.05, **P<0.01. 
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because most are introduced subcutaneously or intramuscularly at mil- 
limolar concentrations (Supplementary Information), high enough for 
cationic peptides to activate mast cells. The most frequent allergic-type 
response described in the FDA labels of these drugs is an injection-site 
reaction (ISR), a local swelling and/or flare of variable size that can be 
accompanied by pain or pruritus. In a survey of FDA-approved pepti- 
dergic drugs, we found that the vast majority associated with ISRs are 
cationic (Supplementary Information). We found that representative 
members of all common, commercially available classes of these cationic 
drugs activated mast cells in an Mrgprb2-dependent manner, whereas 
the innocuous protein insulin had no effect (Fig. 3a and Extended Data 
Fig. 9b, c). Consistently, all of these peptides except insulin activate both 
Mrgprb2-HEK and MRGPRX2-HEK cells (Extended Data Fig. 2). We 
selected the drug icatibant for further study because it induces ISRs in 
nearly every patient’’. Icatibant at the clinical concentration induced 
extensive extravasation and swelling, similar to human ISRs, in wild- 
type mice but not in Mrgprb2™" mice (Fig. 3b). Mice pretreated with 
the mast cell stabilizer ketotifen also showed no inflammation (with- 
out ketotifen: 40.7 + 2.1% increase in paw thickness; with ketotifen: 
3.1 + 0.6% increase; n = 4 each; P = 2.2 X 10°), strongly indicating 
that mast cells mediated the inflammation. Furthermore, icatibant (as 
well as positive controls 48/80 and mastoparan) induced histamine release 
from wild-type peritoneal mast cells, whereas Mrgprb2™" mast cells 
released substantially less histamine (Fig. 3c). However, IgE-mediated 
histamine release was unaffected by Mrgprb2 deletion (Fig. 3c). These 
data lead us to anticipate that drug-induced ISRs may be alleviated by 
targeting MRGPRX2 or by using peptides with less potent MRGPRX2 
agonist properties. 

Next, we explored the possibility that Mrgprb2 mediates pseudo- 
allergic reactions induced by small molecules. We focused on intrave- 
nous drugs because they are often administered rapidly and in high doses, 
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and thus are more likely to achieve high blood concentrations and 
rapid tissue distribution than drugs administered through other routes. 
Symptoms of pseudo-allergic reactions after intravenous administration, 
which at their most severe are called anaphylactoid, include skin flushing 
or rash, changes in blood pressure or heart rate, and bronchospasms”’. 
We based our initial search on the structure of 48/80. While the structure- 
function relationship of 48/80 as an MRGPRX2 agonist is unknown, 
a cyclized variant containing a tetrahydroisoquinoline (THIQ) motif 
(Fig. 4a) is reported to be seven times more potent than 48/80 as a mast 
cell degranulator”’. A search of FDA-approved drugs containing a THIQ 
recovered members of the nicotinic receptor antagonist non-steroidal 
neuromuscular blocking drugs (NMBDs), including tubocurarine and 
atracurium (Fig. 4b). NMBDs are used routinely in surgery to reduce 
unwanted muscle movement and allow intratracheal intubation for 
mechanical ventilation. Intriguingly, NMBDs alone are responsible 
for nearly 60% of allergic reactions in a surgical setting”, and all except 
succinylcholine induce histamine release in humans”’. We found that 
members of all NMBD families (Supplementary Information) except 
succinylcholine activated mast cells in an Mrgprb2-dependent manner 
at concentrations as low as 0.5% of the clinical injection concentration 
(Fig. 4c and Extended Data Fig. 9d). Interestingly, rocuronium does not 
contain a THIQ but has a bulky hydrophobic group with a charged nitro- 
gen within several angstroms (Fig. 4b), reminiscent of 48/80. There- 
fore, we searched using modifications of the THIQ motif and the 48/80 
structure, including changes in cyclization and position of the posi- 
tive or polar nitrogen, limiting our assay to intravenous drugs at high 
injection concentrations. We identified the fluoroquinolone family of 
antibiotics as having a similar motif (Fig. 4d). Like NMBDs, these are 
associated with allergic-type reactions”*”* and can activate mast cells**””. 
We found that the four members approved for intravenous use activated 
Mrgprb2-HEK and MRGPRX2-HEK cells (Extended Data Fig. 2), and 
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Figure 3 | Mrgprb2 mediates mast cell responsiveness and side effects of 
peptidergic therapeutic drugs. a, Percentage of responding cells from 
wild-type (WT) and Mrgprb2““T (MUT) peritoneal mast cells after drug 
application, assayed using Fluo-4 imaging. Concentrations of drugs (in pg ml"): 
icatibant, 50; cetrorelix, 20; leuprolide, 100; octreotide, 10; sermorelin, 60; 
insulin, 80. n = 3 per genotype; >150 cells counted per substance, except >100 
cells counted for insulin. Difference between insulin responsiveness was not 
significant. b, Left, representative images of Evans blue stained extravasation 
15 min after intraplantar injection of icatibant (right, arrow, 10 mg ml |, 


Ciprofloxcin (ug ml-') 


Anti-IgE antibody (ug mi’) 


5 wl in saline) or saline (left). Right, quantification of Evans blue leakage into 
the paw after 15 min. n = 6 per genotype. Difference after saline injection was 
not significant. c, Total histamine release from wild-type (red diamonds) and 
Mrgprb2™"T (black squares) mice after incubation with named substances. 
Note that no significant difference between wild-type and Mrgprb2™”" cells 
was found at any dose of anti-IgE antibody. Experiments were repeated 

>3 times. Data are presented as mean + s.e.m. Two-tailed unpaired Student’s 
t-test: *P < 0.05, **P < 0.01. NS, not significant. 
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Figure 4 | Mrgprb2 mediates mast cell responsiveness and side effects of 
small-molecule therapeutic drugs. a, Structures of 48/80 and a cyclized 
variant. The THIQ motif is highlighted in blue. b, Structures of representative 
members of all NMBD classes (see Supplementary Information). THIQ 
motifs are highlighted in blue. Note that only succinylcholine lacks a bulky 
hydrophobic group. c, Percentage of responding cells from wild-type (WT) and 
Mrgprb2“0T (MUT) peritoneal mast cells after application of various NMBDs, 
assayed using Fluo-4 imaging. Concentrations of drugs (in pg ml’): 
atracurium, 50; mivacurium, 20; tubocurarine, 30; rocuronium, 500. n = 3 mice 
per genotype; >150 cells counted per substance. d, Structure of ciprofloxacin, 
with the motif common to all fluoroquinolones highlighted in blue. Note 

the nitrogens close to the quinolone motif. e, Percentage of responding cells 
from wild-type and Mrgprb2™” peritoneal mast cells after fluoroquinolone 
application, assayed using Fluo-4 imaging. Concentrations of drugs (in 

Lg ml’): ciprofloxacin, 200; levofloxacin, 500; moxifloxacin, 160; ofloxacin, 
400. n = 3 mice per genotype; >150 cells counted per substance. f, Changes 
in body temperature after intravenous injection of ciprofloxacin (1.5 mg in 
125 ul saline) at time 0. n = 4 mice per genotype. Data are presented as 

mean + s.e.m. Two-tailed unpaired Student’s t-test: *P < 0.05, **P < 0.01. 


mast cells in an Mrgprb2-dependent manner (Fig. 4e and Extended 
Data Fig. 9e). Correspondingly, atracurium and ciprofloxacin induced 
histamine release in wild-type peritoneal mast cells and substantially 
less so in Mrgprb2™" mast cells (Fig. 3c). We selected ciprofloxacin for 
in vivo tests of anaphylaxis, which in mice is measured most often by a 
drop in body temperature, probably due to changes in blood pressure 
and peripheral vasodilation”*. Rodents are nearly immune to histamine 
toxicity at a systemic level, contrary to other experimental organisms’, 
but can be rendered sensitive to mast cell activators and secreted products 
by pretreatment with B-adrenergic blockers”*°. Under these condi- 
tions, a high dose of ciprofloxacin induced a rapid drop in body tem- 
perature that was very slow to recover, while Mrgprb2™" mice showed 
amuch smaller drop that recovered quickly (Fig. 4f). These results estab- 
lish that mast cell activation through Mrgprb2 is an off-target effect of 
fluoroquinolones and other drugs. 

Finally, we determined whether drugs associated with pseudo-allergies 
activate human mast cells through MRGPRX2. We found that repres- 
entative members of each examined drug class evoked release of his- 
tamine, tumour necrosis factor (TNF), prostaglandin D, (PGD2) and 
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B-hexosaminidase from LAD2 cells (Extended Data Fig. 10a). 48/80 and 
mastoparan were used as positive controls. Importantly, MRGPRX2- 
siRNA-treated LAD2 cells exhibited significantly less B-hexosaminidase 
release evoked by these substances compared with responses in control- 
siRNA-treated cells, while IgE-mediated release was comparable (Ex- 
tended Data Fig. 10b). The residual B-hexosaminidase release observed 
in MRGPRX2-siRNA-treated cells is probably due to incomplete mes- 
senger RNA and/or protein knockdown. 

Knowledge of the role of MRGPRX2 in drug-induced pseudo-allergies 
should expand further, for two reasons. First, ligand binding require- 
ment studies should enable more specific screens for drugs that cross- 
activate MRGPRX2. Second, screening orally administered drugs may 
uncover more MRGPRX2 ligands, since common side effects of orally 
administered drugs include gastrointestinal problems and headache, 
both of which may have a mast cell component. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Animal models. All experiments were performed in accordance with a protocol 
approved by the Animal Care and Use Committee at the Johns Hopkins University 
School of Medicine. All experiments involving equal treatments in wild-type and 
mutant samples and animals were conducted by experimenters blind to conditions. 
Analysis. Group data were expressed as mean + s.e.m. Two-tailed unpaired Student’s 
t-test was used to determine significance in statistical comparisons, and differences 
were considered significant at P < 0.05. Statistical power analysis was used to justify 
the sample size. We assumed the data were normally distributed since the most 
outcome values were symmetrically distributed around the mean value within each 
group. The variance is similar between groups as determined by the F test. Mast 
cells deemed to be damaged, either by visible lack of fibronectin adherence or by 
abnormally high resting calcium levels, were excluded from analysis. Otherwise, 
no samples or animals subjected to successful procedures and/or treatments were 
excluded from the analysis. No randomization was used for animal studies since it 
is not applicable for the studies. 

Peptides and drugs. Compound 48/80, vespid mastoparan, rocuronium, tubocu- 
rarine, ciprofloxacin, levofloxacin, moxifloxacin and ofloxacin were from Sigma. 
Cortistatin was from Tocris Biosciences. PAMP(9-20) was custom synthesized and 
purified to =98% by Genscript. Leuprolide was from Genscript. Substance P, kalli- 
din, mastoparan, cetrorelix, octreotide, sermorelin (growth hormone releasing factor 
1-29) and icatibant (HOE-140) were from Anaspec. Atracurium and mivacurium 
were from Santa Cruz Biotechnology. Recombinant human insulin was from Roche. 
Goat anti-mouse IgE (Ab9162) was from Abcam. 

Drug preparation and storage. Atracurium, mivacurium, tubocurarine and all 
fluoroquinolone solutions were prepared on the day of the experiment because the 
potencies of the first three were found to be susceptible to oxidation and/or freeze- 
thaw effects, while the solubility of the fluoroquinolones was best when prepared 
fresh. Propranolol was also prepared fresh on the day of the experiment to minimize 
the chances ofa loss in potency. All fluoroquinolones except levofloxacin were dis- 
solved into CIB adjusted to pH 3.5. All other drugs were prepared as 100 -1,000 
aliquots and stored at —80 °C before thawing at 4°C and diluting into calcium 
imaging buffer or saline. 

Mrgpr RT-PCR screen. RNA was purified from 4 X 10* mouse peritoneal mast 
cells with a Qiagen RNEasy Micro column, according to the manufacturer’s sug- 
gestions. RNA was treated for 20 min with DNase I (New England BioLabs) and 
re-purified on another RNEasy Micro column. Eight nanograms of RNA was used 
to generate first-strand complementary DNA using a SuperScript III kit (Invitrogen) 
according to the manufacturer’s instructions, using oligo dT primers and scaling 
the recommended 10 ul reaction up to 60 pil. The negative control reaction was the 
same except that SuperScript III reverse transcriptase was replaced by water. Twenty- 
five microlitre PCR reactions were run with 12.5 tl RedTaq ReadyMix (Sigma), 
0.5 pl dimethylsulphoxide (DMSO), 0.25 il each of 50 uM gene-specific forward 
and reverse primers, 10 jl water and 2 ul mixture from the cDNA or negative 
control synthesis reactions. All reactions used a 4 min initial step at 95 °C, 30s 
annealing at specific temperatures (described later), 40s extension at 72 °C, and 
25s at 95 °C (with the last three steps repeated 39 times), and a final 4 min step at 
72 °C. Low stringency PCR was set to 60 °C annealing; otherwise, annealing tem- 
peratures were: 62°C for Mrgpral, Mrgpral0, Mrgprb2 and Mrgprb6; 64°C for 
Mrgpra2, Mrgpra3, Mrgpra4, Mrgpra6, Mrgpral6, Mrgpra18 and Mrgprb11; 65°C 
for Mrgpra9, Mrgpral9, Mrgprb1, Mrgprb3, Mrgprb5 and Mrgprb8; 66°C for 
Mrgpral2 and Mrgprb10; 63°C for Mrgprb4; 61°C for Mrgpral4; and 65.5 °C 
for Mrgprc11. Primers were as follows. Mrgpral, forward, ATCCAGCAAGAGG 
AATGGGG, reverse, TGTGACCTAGGAGGAAGAAGAAG; Mrgpra2, forward, 
CCTCCTACACAAGCCAGCAA, reverse, AAGCACAAGTGAAAGATGATGCT; 
Mrgpra3, forward, GCTACATCCAGCAAGAGGAATG, reverse, GCAAAAAT 
TCCTTTGGGTAGGGT; Mrgpra4, forward, CCTGTGTGCTGTGATCTGGT, 
reverse, TCACGGTTAATCCAGGGCAC; Mrgpra6, forward, CATTTTCCTCC 
CCCAACAGT, reverse, ATGCCTGAATGAGCCCACAA; Mrgpra9, forward, CA 
GTGATCTACATCCAGCAAAAGG, reverse, GCGTGGAAGCTATGATGCGA; 
Mrgpralo, forward, CAGTGGTCCACCATCTCCAA, reverse, ACAGGCAAGA 
GAGTCATGGTT; Mrgpra12, forward, TCAGGGATCGGGTGAAGCAG , reverse, 
GAGCATTTGAAGGTGTTGTTGGA; Mrgpral4, forward, GGTTGCCCCTGT 
GTTTCTTC, reverse, TATTGCCAGTCAGTAAGCTGAG; Mrgpra16, forward, 
GCCCTCTGGTTCCCATTACT, reverse, GTTTTTGGACCACTGAGGCATT); 
Mrgpra18, forward, TGCTCTGGTTTTCTCCTTTGC, reverse, TGAGGCATGT 
CAAGTCAGTCA; Mrgpral9, forward, CAGGACCCAGATCACGACAG reverse, 
TCCTGGGCTTCCGATTTCAG; Mrgprb1, forward, ATTAGCCTTCATCAGG 
CACCA, reverse, CCAGCCCAACTAAGGCAATG; Mrgprb2, forward, GTCACAG 
ACCAGTTTAACACTTCC, reverse, CAGCCATAGCCAGGTTGAGAA; Mrgprb3, 
forward, ACCTGGCTGTGGCTGATTTT, reverse, GCTGAACCCACAGAGAA 
CCA; M: reprb4, forward, TCTGGCTGGTGCTGATTTCTT, reverse, ACCACGA 
GGCTCAACAATAGA; Mrgprb5, forward, CTGTGGTTCCTTCTGTGTCCA, 


reverse, TTTCCAGTTCCCCAGACCTTT; Mrgprb6, forward, TCTGTCTACAT 
CCTCAACCTGG, reverse, ATTATCTCATGAGGAAGGCTCAA; Mrgprb8, for- 
ward, AGAGAATGCAAAGCATGCGA, reverse, GAGGAAGTTTGCCCCAGA 
CA; Mrgprb10, forward, CACTGGTCACATTGCCAACC, reverse, GGGGATG 
GAATCAATGTCCAAGA; Mrgprb11, forward, ACCTTCTTGCTATTTTTCCC 
TCCA, reverse, AGGATGAGACTGGACCCACA; Mrgprc11, forward, CAGCA 
CAAGTCAGCTCCTCAA, reverse, ATGCCCATGAGAAAGGACAGAACC. 
Expression constructs. Mrgpr genes were cloned and inserted into the pcDNA3.1 
mammalian expression plasmid using standard techniques. All mouse genes had a 
Kozak sequence at their amino terminus and also encoded a carboxy-terminal Flag 
tag separated from the genes by the amino acid linker DIIL. 

cDNA constructs. First-strand cDNA was prepared as described for RT-PCR 
screens, and amplification was performed using the Q5 HotStart High Fidelity Master 
Mix (New England Biolabs). At least five different clones each prepared from wild- 
type and mutant mice were sequenced to verify the presence of the deletion in the 
mutant and the absence of any other mutation from wild type or mutant. 
Calcium imaging in HEK293 cells. In initial screens, HEK293 cells (not tested for 
mycoplasma but rapidly dividing) were transiently transfected with gene constructs 
including a C-terminal Flag tag, and plated on 100 jg ml ' poly-p-lysine-coated 
glass coverslips 6h after transfection. Twenty-four hours later, cells were loaded 
with AM esters of the calcium indicators Fura-2 or Fluo-4 (Molecular Probes) 
along with 0.02% Pluronic F-127 (Molecular Probes) for 45 min at 37 °C. Fura-2- 
loaded cells were imaged during 340 and 380 nm excitation, and Fluo-4 loaded 
cells were imaged during 488 nm excitation. Later experiments used cell lines stably 
expressing receptors along with transient or stable expression of the promiscuous 
G protein Ga15. Cells were imaged in calcium imaging buffer (CIB; NaCl 125 mM, 
KC13 mM, CaCl, 2.5 mM, MgCl, 0.6 mM, HEPES 10 mM, glucose 20 mM, NaHCO, 
1.2 mM, sucrose 20 mM, brought to pH 7.4 with NaOH). Unless otherwise spe- 
cified, drugs were perfused into the chamber for 45 to 60s and responses were 
monitored at 5-s intervals for an additional 60-90 s. 

ECs determination. HEK293 cells stably expressing Ga15 and either Mrgprb2 or 
MRGPRX2 were plated at 4 X 10° cells per well in 96-well plates and incubated 
overnight. The next day, media was removed and replaced with imaging solution 
from the FLIPR Calcium 5 assay kit (Molecular Devices), diluted according to man- 
ufacturer’s suggestions in Hank’s balanced salt solution (HBSS) with 20 mM HEPES, 
pH 7.4. Cells were incubated in 100 pl imaging solution at 37 °C for 60 min, and 
allowed to recover for 15 min at room temperature before imaging in a Flexstation 3 
(Molecular Devices). Wells were imaged according to manufacturer’s specifications 
for 120s, with 50 il of test substances at three times the concentration added 30 s 
after imaging began. Responses were determined by subtracting the minimum 
signal from the maximum signal. Substances were tested in duplicate wells, the 
signals were averaged, and ECs» values were determined for each trial by normal- 
izing to the peak response to the substance in that trial. All drugs were dissolved in 
HBSS plus HEPES solution, with the following exceptions due to solubility issues: 
cetrorelix acetate was dissolved in saline containing 2.5 mM CaCl, and 0.6 mM 
MgCl, and fluoroquinolones except ofloxacin were dissolved in the same solution 
except that the pH was adjusted with HCl to 3.5; ofloxacin required 100 pig ml? of 
lactic acid for full solubility. We also noticed that peptides sometimes lost potency 
after a freeze-thaw cycle, so most peptides were prepared directly from lyophilized 
stock. 

Peritoneal mast cell purification and imaging. Adult male and female mice 2-5 
months of age were killed through CO, inhalation. A total of 12 ml ofice-cold mast 
cell dissociation media (MCDM; HBSS with 3% fetal bovine serum and 10 mM 
HEPES, pH 7.2) were used to make two sequential peritoneal lavages, which were 
combined and cells were spun down at 200g. The pellet from each mouse was resus- 
pended in 2 ml MCDM, layered over 4 ml of an isotonic 70% Percoll suspension 
(2.8 ml Percoll, 320 pl 10 HBSS, 40 pl 1M HEPES, 830 pl] MCDM), and spun 
down for 20 min, 500g, 4°C. Mast cells were recovered in the pellet. Purity was 
>95%, as assayed by avidin staining and by morphology. Mast cells were resuspended 
at 5 X 10°-1 X 10° cells ml ' in DMEM with 10% fetal bovine serum and 25 ng ml" 
recombinant mouse stem cell factor (Sigma), and plated onto glass coverslips coated 
with 30 yg ml fibronectin (Sigma). For counting, instead of plating, suspended 
mast cells were diluted 1/10 and affixed to slides by spinning at 1,000 r.p.m. for 
5 min at 4°C on a CytoSpin (Thermo Scientific). 

For imaging, after 2h of incubation at 37 °C, 5% CO,, mast cells were loaded 
with Fluo-4 along with 0.02% Pluronic F-127 for 30 min at room temperature, 
washed three times in CIB and used immediately for imaging. Cells were used 
within 2h of loading. Cells were identified as responding if the [Ca”*], rose by at 
least 50% for at least 10s, which clearly distinguishes a ligand-induced response 
from random flickering events. Average traces were calculated by taking the aver- 
age response from each cell in a mouse, and averaging those. 

BAC transgenic mice generation. We purchased the BAC clone RP23-65123 from 
the Children’s Hospital Oakland Research Institute. This clone contains the Mrgprb2 
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locus, ~60 kb of 5’ genomic sequence and over 100 kb of 3’ genomic sequence. 
Recombineering in bacteria was used to introduce eGFP-Cre and a polyA signal 
immediately after the Mrgprb2 start codon*'. The BAC was linearized with NotI (New 
England Biolabs) and injected into pronuclei from single-cell-fertilized C57BL/6 
eggs. Eggs were implanted into pseudopregnant females. Three BAC mouse lines 
were established. Although mice were already in a C57BL/6 background, they 
were crossed for at least four generations to wild-type and tdTomato reporter mice 
in the C57BL/6 background before use in experiments. BAC mice were mated to 
ROSA26'*7°"#" mice purchased from Jackson Laboratories for imaging studies. 
Experiments for Fig. 1 used mice homozygous for ROSA26'“7?""""° because the 
tdTomato signal was often heterogeneous and weak in heterozygous mice. Geno- 
typing reactions for BAC mice were run at 61 °C annealing, and primers were: 
forward, TATATCATGGCCGACAAGCA; reverse, CAGACCGCGCGCCTGA 
AGA. Both primers are in the eGFP-Cre reading frame but the entire gene and 
correct placement in the Mrgprb2 locus was verified by previous sequencing. 
Mrgprb2 mutant mice generation. mRNAs encoding zinc finger nucleases target- 
ing Mrgprb2 were purchased from Sigma. The binding sites were GTTCCTGGGC 
ATCCG and TGCACACGAATGCCTTCACTG, corresponding to bases 180-194 
and 196-216, respectively, of the Mrgprb2 open reading frame. mRNA was diluted 
to 2ng ml! in 1 mm Tris-HCl buffer, pH 7.4, with 0.25 mm EDTA, and injected 
into the pronuclei of single-cell-fertilized eggs in the C57BL/6 strain. No overt signs 
of toxicity were observed. Embryos were implanted into pseudopregnant females. 
DNA flanking the binding sites was amplified from founder mice and screened for 
mutations using the Cel-1 assay kit (Transgenomics), according to the manufac- 
turer’s suggestions. Three of the first 28 mice were identified and confirmed by 
DNA sequencing to carry small mutations, and no more screening was performed. 
In addition to the 4 bp mutation used in this study, a mouse carrying a 1 bp deletion 
and another with a 2 bp deletion were identified. 

Wild-type and Mrgprb2™” mouse genotyping. Primers used for wild-type mice 
were GGTTCCTGGGCATCCGTAT and GGTTCCTGGGCATCCGTAT, and reac- 
tions were run at an annealing temperature of 62.8 °C. Primers for Mrgprb2@UT 
mice were GTTCCTGGGCATCCGCAC and CTTCCGCCTGAACCTTCGGT, 
and reactions were run at 64.0 °C annealing temperature. 

Avidin labelling of tissue. Adult male and female mice up to 8 months of age were 
anaesthetized with pentobarbital and perfused with 20 ml 0.1 M PBS (pH 7.4, 4 °C) 
followed by 25 ml of fixative (4% formaldehyde (vol/vol), 4 °C). Heart, trachea and 
skin sections were dissected from the perfused mice. Tissues were post-fixed in fix- 
ative at 4 °C overnight. When skin sections were the only tissues needed, they were 
dissected and placed in fixative directly after asphyxiation of mice by CO) inhala- 
tion, eliminating the perfusion step. Tissues were cryoprotected in 20% sucrose 
(wt/vol) for more than 24h and were sectioned (20 tm width) with a cryostat. The 
sections on slides were dried at 37 °C for 30 min, and fixed with 4% paraformalde- 
hyde at 21-23 °C for 10 min. The slides were pre-incubated in blocking solution 
(10% normal goat serum (vol/vol), 0.2% Triton X-100 (vol/vol) in PBS, pH 7.4) for 
1 or 2 hat 21-23 °C, then incubated with 1/500 FITC-avidin (Sigma) or rhodamine- 
avidin (Vector Labs) for 45 min. Sections were washed three times with water or PBS 
and a drop of Fluoromount G (SouthernBiotech) was added before coverslips were 
placed on top. Heart mast cells were examined near cavities because the density 
was much higher than elsewhere in the tissue; avidin-positive, td Tomato-negative 
cells were observed embedded in muscle tissue in very low numbers, but their 
identity was unclear. 

For avidin labelling of peritoneal mast cells, cells were plated as described earlier, 
fixed with 4% paraformaldehyde at 21-23 °C for 10 min, incubated with 1/1,000 
avidin in PBS for 30 min at 21-23 °C, and washed with PBS before immediate 
imaging. 

Stomach section immunocytochemistry. Adult male and female mice up to 
8 months of age were anaesthetized with pentobarbital and perfused with 20 ml 
0.1 M PBS (pH 7.4, 4 °C) followed by 25 ml of fixative (4% formaldehyde (vol/vol), 
4 °C). Stomach sections were removed, washed thoroughly, postfixed in 4% form- 
aldehyde for 2 h, and prepared for sectioning by incubation in a 30% sucrose solu- 
tion for 48 h. Tissue samples were mounted in cryoembedding media and frozen, 
and 14 um sections were made using a crytostat and then fixed onto slides. Slides 
were washed with a 0.2% Triton X-100 PBS solution, incubated for 1h in a 10% 
normal goat serum solution, and then incubated overnight at 4 °C with a 1:20 dilu- 
tion of rat monoclonal anti-mouse MCPT1 (monoclonal antibody RF6.1, eBio- 
sciences) in a 0.2% Triton/1% normal goat serum solution. Slides were washed 
with the 0.2% Triton solution and incubated for 2 h at room temperature in Triton 
solution with a 1:500 dilution of a goat anti-rat IgG Alexa Fluor 488 conjugated 
antibody (Life Technologies). Slides were washed in PBS before coverslips were 
added with an anti-fade solution for imaging. 

Peripheral white blood cell preparation. Blood was collected from Mrgprb2- 
tdTomato mice via cardiac punctures with a syringe containing PBS with 30 units 
ml‘ heparin and 5 mM EDTA, diluted 1:1 with the same solution, and allowed to 
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cool to room temperature before layering over 6 ml of a Histopaque-1119 solution 
ina 15 ml conical tube. Tubes were centrifuged at 700g for 30 min, and white blood 
cells were collected at the interface between the PBS and Histopaque solutions. Cells 
were washed with PBS and spun down at 500g for 10 min a total of three times. Cells 
were spun onto poly-lysine-coated slides in a Cytospin 4 (Thermo Scientific) at 
600 r.p.m. for 3-5 min, dried overnight on a 37 °C heating block, and incubated for 
2 min with Hoechst 33342 diluted to 0.5 ig ml‘ in PBS before coverslip mounting 
with an anti-fade solution. In parallel, we also stained cells in suspension with 
Hoechst 33342, spun the cells down, and mixed the resuspended cells directly in a 
PBS/anti-fade solution before placing directly onto slides and mounting coverslips 
on the suspension. No tdTomato-positive cells were seen in any preparation using 
either method. 

Tissue histamine release studies. Whole tracheae or segments of skin isolated 
from the abdominal aspect of shaved male and female mice up to 6 months of age 
(4-8 mg wet weight) were dissected and cleaned of connective tissue. After a 60 min 
incubation period in oxygenated Krebs’ bicarbonate buffer solution (37 °C), the 
tissue was treated with either vehicle or compound 48/80 for 30 min. The super- 
natant solution was saved for histamine analysis. The tissue was then subjected to 
8% percholoric acid in a 37 °C water bath for 15 min to obtain total histamine con- 
tent. Histamine was assayed by the automated fluorometric technique previously 
described”. 

Tracheal contractions. Tracheal contractions were carried out as previously 
described”’. For allergen (ovalbumin) responses, mice were actively sensitized by 
injecting 0.2 ml ofan ovalbumin solution (3.75 1g ml ') mixed with Al(OH); three 
times at an interval of 2 days. Experiments were conducted on male and female 
animals 8-12 weeks of age beginning 2 weeks after the first injection. Trachea were 
cleaned of connective tissue and tracheal rings (whole or laterally divided in half), 
were suspended between two tungsten stirrups in 10 ml organ chambers filled with 
Krebs’ buffer that was warmed to 37°C and bubbled with 95% O,-5% CO, to 
maintain a pH of 7.4. One stirrup was connected to a strain gauge (model FT03; 
Grass Instruments), and tension was recorded on a Grass Model 7 polygraph (Grass 
Instruments). Preparations were stretched to a resting tension of 0.2g, and washed 
with fresh Krebs’ buffer at 15-min intervals during a 60-min equilibration period. 
After equilibration, trachea were challenged with either ovalbumin (10 1g ml ') 
or compound 48/80. At the end of each experiment, all trachea were maximally 
contracted with carbachol (1 1M). All results are expressed as a percentage of max- 
imum contraction. 

Hindpaw swelling and extravasation. Adult male mice up to 8 months of age 
were anaesthetized with an intraperitoneal (i.p.) injection of 50 mg kg” pentobar- 
bital (Sigma). Fifteen minutes after induction of anaesthesia, mice were injected 
intravenously (i.v.) with 50 pl of 12.5 mg ml ' Evans blue (Sigma) in saline. Five 
minutes later, 5 pil of the test substance (or 7 pil of anti-IgE) was administered by 
intraplantar injection in one paw and saline was administered in the other paw. 
Paw thickness was measured by callipers immediately after injection. Fifteen min- 
utes later (30 min after anti-IgE), paw thickness was measured again and mice were 
killed by decapitation. Paw tissue was collected, dried for 24 h at 50 °C, and weighed. 
Evans blue was extracted by a 24h incubation in formamide at 50 °C, and the OD 
was read at 620 nm using a spectrophotometer. For studies using ketotifen, mice 
were injected ip. with 25 jl ofa 10 mg ml" solution of ketotifen at the same time 
as pentobarbital. 

Systemic anaphylaxis assay. To minimize stress, animals were transported to 
the procedure area the day before injections. Adult male and female mice up to 
8 months of age (25 to 35 g) were given an i.p. injection of 80 ig propranolol in saline 
(2 mg ml ') immediately after removal from their cages, and then placed back in 
their cages for 30 min before intravenous injections. The intravenous injections 
were performed on one mouse ata time. For each injection, a mouse was placed ina 
transport box and brought to a room with no other mice, to minimize stress from 
vocalizations during injection. The mouse was then placed in a restrainer, and the 
injection was performed within 4 min of restraint because we observed that longer 
restraint times affected body core temperature independent from the injection. 
Tail veins were dilated by repeated wiping of tail with a tissue soaked in 100% eth- 
anol, followed by injection of ciprofloxacin in a 0.25 ml Hamilton syringe fit with a 
30.5-gauge needle (BD Biosciences). The injection was determined to be successful 
only when all of the criteria were met: blood appeared in the syringe after needle 
insertion, all tail veins were visible after injection, and the mouse bled slightly from 
the injection site after needle withdrawal. The injection site was swabbed until 
blood stopped flowing, the mouse was placed in a separate cage from its housing 
cage, one mouse per cage, and returned to the room it was brought from. At least one 
wild-type and one mutant mouse were used for each experimental session. Body 
core temperature was measured with a rectal thermometer. 

Mouse peritoneal mast cell histamine release assay. Mast cells were purified 
as described earlier and allowed to recover for 2h in DMEM with 10% FBS and 
25ngml * mouse stem cell factor in a 37 °C incubator with 5% COo. Cells were 
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then spun down, resuspended in CIB, counted, and plated at 300 cells per well in 
75 ul CIB in 96-well plates coated with 20 pg ml fibronectin (Sigma). They were 
allowed to adhere to the substrate for 45 min at 37 °C in atmospheric conditions 
(that is, CO, levels were not adjusted) before assay. For the assays, cells were removed 
to room temperature and 75 tl of 2X concentrations of tested substances (all in 
CIB except for ciprofloxacin, which was in saline with 2.5 mM CaCl, and 0.6 mM 
MgCl, pH 3.5) were added. After 5 min, 40 pil of supernatant was aspirated, diluted 
with 40 pil CIB and frozen at —80 °C until histamine levels were determined. Anti- 
IgE treatment was similar, except that cells were incubated for 30 min at 37 °C after 
anti-IgE was added before aspiration of supernatant. Histamine content was deter- 
mined by using an HTRF histamine assay kit (Cisbio Assays) according to the 
manufacturer’s instructions. 

Human mast cell culture. LAD2 (Laboratory of Allergic Diseases 2) human mast 
cells were cultured in StemPro-34 SFM medium (Life Technologies) supplemen- 
ted with 2mM t-glutamine, 100 U ml’ penicillin, 50 pg ml~! streptomycin and 
100 ng ml ' recombinant human stem cell factor (Peprotech). The cell suspensions 
were seeded at a density of 0.1 X 10° cells ml~' and maintained at 37°C and 5% 
COs, and periodically tested for the expression of CD117 and FceRI by flow cytom- 
etry. Cell culture medium was hemi-depleted every week with fresh medium. 
LAD? degranulation assay. LAD2 cells were sensitized for 20h with 0.5 pg ml 
biotin-conjugated human IgE (Abbiotec). Cells were washed, resuspended in HEPES 
buffer (10 mM HEPES, 137 mM NaCl, 2.7mM KCI, 0.38 mM Na,HPO,.7H,0, 
5.6 mM glucose, 1.8 mM CaCl.H20, 1.3 mM MgSO4.7H20, 0.4% BSA, pH 7.4) at 
0.025 X 10° per well, and then stimulated with 0.1 pg ml” ' streptavidin (Life Tech- 
nologies) or other agonists at the indicated concentrations for 30 min at 37 °C/5% 
CO). The B-hexosaminidase released into the supernatants and in cell lysates was 
quantified by hydrolysis of p-nitrophenyl N-acetyl-f-p-glucosamide (Sigma-Aldrich) 
in 0.1M sodium citrate buffer (pH 4.5) for 90 min at 37°C. The percentage of 
B-hexosaminidase release was calculated as a per cent of total content. Agonists 
tested were compound 48/80, mastoparan, icatibant, atracurium bessylate and 
ciprofloxacin hydrochloride. 

Enzyme immunoassay and ELISA. LAD2 cells were washed with medium, sus- 
pended at 0.25 X 10° cells per well, and incubated with compound 48/80, mastoparan, 


icatibant, atracurium or ciprofloxacin at the indicated concentrations for 3-24h 
at 37 °C/5% CO>. Cell-free supernatants were harvested and analysed for PGD2 
release by an enzyme immunoassay (Cayman chemical), while TNF content was 
quantified using an ELISA kit (eBioscience) according to the manufacturer’s instruc- 
tion. The minimum detection limits were 55 pg ml * for PGD, and 5.5 pg ml * 
for TNF. 

Measurement of histamine release from LAD2 cells. LAD2 cells were washed, 
suspended in BSA-free HEPES buffer at 0.1 X 10° per well, and incubated with 
compound 48/80, mastoparan, icatibant, atracurium or ciprofloxacin at the indi- 
cated concentrations for 30 min at 37 °C/5% COp. A histamine (Sigma-Aldrich) 
stock solution of 100 1g ml’ was prepared and stored at —20 °C. The working 
standards of 4,000 ng ml‘ to 7.8 ng ml _' were freshly prepared using twofold serial 
dilution. O-phthalaldehyde (OPT; Sigma-Aldrich) was dissolved in acetone-free 
methanol (10 mg ml~ ’) and kept in the dark at 4 °C. Histamine standards and cell- 
free supernatants (60 ll) were transferred to a flat-bottom 96-black-well microplate 
and mixed with 12 pl 1 M NaOH and 3 pl OPT. After 4 min at room temperature, 
6 ul 3M HCl was added to stop the histamine-OPT reaction. Fluorescence inten- 
sity was measured using a 355 nm excitation filter and a 460 nm emission filter. 
siRNA transfection of LAD2 cells. Expression of MRGPRX2 was downregulated 
with ON- TARGET plus SMARTpool siRNA against MRGPRX2 and control siRNA 
from Dharmacon. LAD2 cells were washed with medium, suspended at 0.5 X 10° 
cells per well, and transfected with 100 nm MRGPRX2 siRNA and control siRNA 
in antibiotic-free StemPro medium using Lipofectamine 3000 (Life Technologies) 
according to the manufacturer’s instruction at 37 °C/5% CO. At 48 h, knockdown 
was confirmed by RT-PCR, and the cells were used for degranulation assays. 
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Extended Data Figure 1 | MRGPRX1 orthologues are not expressed at 
relevant levels in mast cells under naive conditions. a, Results from a 
low-stringency RT-PCR screen (see Methods) in peritoneal mast cells for 
expression of the MRGPRX1 orthologues Mrgpra3 and Mrgprc11. Arrow points 
to expected band sizes. b, Percentages of peritoneal mast cells responding to the 
MRGPRX1 and Mrgprcl1 agonist bovine adrenal medulla derived peptide, 
fragment 8-22 (BAM8-22, 500 nM). Activation was assayed by measuring rises 
in intracellular calcium, using imaging of the Fluo-4 dye. Differences are not 
significant (P = 0.39). n = 3 mice from each genotype. Group data are 
expressed as mean + s.e.m. Two-tailed unpaired Student’s t-test was used to 
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determine significance in statistical comparisons. WT, wild type; KO, 
knockout. c, Chart summarizing responses to MRGPRX2 ligands and the 
MRGPRX1 ligand chloroquine (CQ) by HEK293 cells transiently transfected 
with plasmids driving expression of MRGPRX2, Mrgprb2 and other mouse 
Mrgpr proteins (that is, Mrgprb1, Mrgprb10 and Mrgprb11) most closely 
related to Mrgprb2. Positive and negative responses are indicated with ticks and 
crosses, respectively. Responses were considered positive if at least half of the 
transfected cells showed a 50% increase in [Ca”*];. No cells transfected with 
Mrgprb1, Mrgprb10 and Mrgprb11 responded to any listed drug. 
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a Mouse Mrgprb2-expressing HEK293 cells 
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Cc Substance Mrgprb2 EC,, MRGPRX2 EC,,, 
Compound 48/80 3.7 + 0.5 pg/ml 470.1 + 139.6 ng/ml 
Substance P 54.3 + 4.9 uM 152.3 + 48.0 nM 
Cortistatin-14 21.3 + 0.9 uM 106.7 + 39.3 nM 
PAMP (9-20) 12.44 1.6 uM 166.0 + 35.7 nM 
Mastoparan 24.0 + 3.6 uM 3.9+0.7 uM 
Icatibant 32.5 + 2.0 ug/m 15.8 + 2.7 ug/ml 
Cetrorelix 23.4 + 1.4 ug/ml 221.7 + 63.1 ng/ml 
Sermorelin 29.1 + 1.2 ug/m 4.5 + 0.9 ug/ml 
Octreotide 10.0 + 1.1 pg/ml 6.6 + 0.7 pg/ml 
Leuprolide 152.0 + 7.1 ug/ml 9.1 + 0.7 pg/ml 
Atracurium 44.8 + 1.4 ug/ml 28.6 + 2.4 yg/ml 
Rocuronium 22.2 + 3.3 ug/m 261.3 + 14.4 ug/ml 
Ciprofloxacin 126.5 + 5.1 ug/ml 6.8 + 0.5 pg/ml 
Moxifloxacin 14.1 42.1 pg/m 9.9 + 0.6 pg/ml 
Levofloxacin 807.6 + 47.1 ug/ml 22.7 + 0.4 ug/ml 


Ofloxacin 


225.0 + 25.4 ug/ml 
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Extended Data Figure 2 | Basic secretagogues and drugs that induce 
pseudo-allergic reactions activate mouse Mrgprb2 and human MRGPRX2 
expressed in HEK293 cells. a, b, Example traces showing changes in [Ca? Bail 
as measured by Fluo-4 imaging, from HEK293 cells expressing Mrgprb2 

and Ga15 (a) or MRGPRX2 and Ga15 (b). Substances were perfused from 
the 30 to 90 s time period, except for ciprofloxacin, which was perfused between 
the 30 and 60s time periods to minimize exposure to the low pH solutions it 


was dissolved in. Insulin was used as a negative control. c, Table of half- 
maximum effective concentration (EC; 9) values of basic secretagogues and 
drugs associated with pseudo-allergic reactions to activate Mrgprb2- and 
MRGPRX2-expressing HEK293 cells. The ECs values were determined from 
dose-response studies, which were repeated three times. Data are expressed as 
mean + s.e.m. 
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Extended Data Figure 3 | Multiple lines of BAC transgenic mice confirm Mrgprb2 open reading frame were mated to tdTomato reporter mice and 


mast-cell-specific MrgprB2 expression. Representative confocal images from  tdTomato (red) expression was compared to avidin staining (green), a marker 
two other BAC transgenic mouse lines. BAC mice expressing eGFP-Creinthe for mast cells. Scale bar, 20 jim. 
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Extended Data Figure 4 | Mrgprb2 is not expressed in mucosal mast cells 
or peripheral white blood cells. a, Representative images of a stomach 
section from an Mrgprb2-tdTomato mouse stained with an anti-MCPT1 
(B-chymase) antibody to label mucosal mast cells. White arrows indicate 
positive cells. No cells were double-labelled (296 Mcpt1-labelled cells and 
275 tdTomato-positive cells counted, n = 3 mice). Scale bar, 40 jum. 

b, Representative images of a Cytospin preparation of peripheral white blood 
cells from an Mrgprb2-tdTomato mouse doubly labelled with tdTomato for 
Mrgprb2-expressing cells (red; left image) and Hoechst 33342 nuclear staining 
(blue; right image). No peripheral white blood cell expressed tdTomato (n = 3 
mice; >4,000 cells examined). Scale bar, 40 um. 


©2015 Macmillan Publishers Limited. All rights reserved 


48,565,000 48,560,000 48,555,000 


[{—————_ ii 


Ny i=] 
b 2 ? 


WI TICCTGGGCATCCGTATGCACACGAATGCCTICACTGTCTACATTICTCAACCTGGCTATG 


FITETETTtttel PEPETEET ETT PE TEEPE eee 
MUT TTCCTGGGCATCC----GCACACGAATGCCTTCACTGTCTACATTCTCAACCTGGCTATG 


1 Ss 


Extended Data Figure 5 | Mrgprb2™" mice are functional knockouts. 

a, Illustration of the genomic region in and around the Mrgprb2 locus. Note that 
repetitive sequences including long interspersed elements (LINEs), short 
interspersed elements (SINEs), and long tandem repeats (LTRs) begin 
immediately after the 3’ side of the Mrgprb2 gene, and in addition are present 
within 2.5 kb of the 5’ side. A BLASTN search in March 2014 using the 500 
bases adjacent to the 3’ end of Mrgprb2 as a query turned up more than 269,000 
hits in the mouse genome. b, Comparison of the wild-type (WT) and mutant 
(MUT) genomic sequences shows the location of the 4 bp deletion in the 
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mutant. Numbers correspond to the Mrgprb2 open reading frame. 

c, Sequencing result from wild-type and mutant complementary DNA sampled 
from mice born 18 months after the mutant line was established. The bases 
missing in the mutant are highlighted in red. d, Amino acid translation of the 
Mrgprb2™"7 open reading frame reveals that the deletion creates a frameshift 
mutation and an early termination codon (marked with an asterisk) shortly 
after the first transmembrane region. Mut, site of the frameshift deletion; TM1, 
transmembrane region 1. 
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Extended Data Figure 6 | The mast cell numbers and the histamine content 
of tracheal and skin tissue was not different between wild-type and 
Mrgprb2™™T animals. a, Top, representative pictures of avidin staining in 
wild-type (WT) and Mrgprb2MUT (MUT) mice. Scale bar, 40 tm. Bottom, 
quantification of mast cell numbers in various tissues. Differences are not 
significant, using a two-tailed unpaired Student's t-test (n = 3 mice for each 
genotype; over 3,000 jim” and 1,000 jum? counted for each genotype for hairy 


and glabrous skin, respectively; over 10,000 peritoneal cells counted). b, The 
tracheal histamine content averaged 5.9 + 0.9and5.5 + 1.6ngmg '(n=5 for 
each genotype), respectively; the skin histamine content averaged 30.8 + 3.2 
and 30.2 + 4.0ngmg ' (n = 8 for each genotype), respectively. Differences 
were not significant. Group data are expressed as mean + s.e.m. Two-tailed 
unpaired Student's t-test was used to determine significance in statistical 
comparisons. 
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Extended Data Figure 7 | Endothelin acting through the ETA GPCR™ 
induced comparable activation in Mrgprb2™™" and wild-type mast cells. 
a, Representative heat map images of mouse peritoneal mast cells showing 
changes in [Ca’*]i, as assayed by Fluo-4 imaging, induced by bath application 
of endothelin (1 |1M). Scale bar, 10 jim. b, Averages of [Ca?* Ji imaging traces 
for wild-type (WT) (red line) and Mrgprb2M0T (MUT) (black line). The 


LETTER 


Endothelin C Endothelin 
WT = 
MUT 6 80 
a2 
2) 
© 60 
2 
8 40 
BD 
= 20 
Pe 
40 80 120 2 0 
seconds = WT MUT 


[Ca’*]i traces are similar between wild-type and mutant groups. Traces were 
averaged as described for Fig. 2a. c, Quantification of percentage of responding 
cells. Group data are expressed as mean + s.e.m. Two-tailed unpaired 
Student’s t-test was used to determine significance in statistical comparisons 
(n = 3 for each genotype; over 180 cells counted for each genotype). 
Endothelin-induced responses were not significantly different. 
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Extended Data Figure 8 | IgE-mediated inflammation does not differ 
between wild-type and MrgprB2™™” mice. a, Representative images of Evans 
blue stained extravasation 15 min after intraplantar injection of anti-IgE 
antibody (right, arrow, 100 pg ml 1,7 ul in saline) or saline (left). 

b, Quantification of Evans blue leakage into the paw after 15 min (n = 6 for wild 
type (WT), n = 7 for MrgprB2™7 (MUT)). Differences after anti-IgE antibody 
(P = 0.49) and saline (P = 0.23) injection are not significant. Group data are 
expressed as mean + s.e.m. Two-tailed unpaired Student’s t-test was used to 
determine significance in statistical comparisons. 
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Extended Data Figure 9 | Mrgprb2™™" mast cells are unresponsive to basic 
secretagogues and various therapeutic drugs. a, Example traces showing 


changes in [Ca”*];, as measured by Fluo-4 imaging, from wild-type (WT) and 
MUT 


Mrgprb2 


from Fig. 2e. Each trace is a response from a unique cell. b, Representative 


Fluo-4 images (left) and fluorescence traces (right) from wild-type (top) and 
Mut 


Mrgprb: 
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measured by Fluo-4 imaging, from wild-type and Mrgprb2™” peritoneal mast 
cells induced by selected FDA-approved cationic peptidergic drugs. Each trace 
is a response from a unique cell. d, Representative Fluo-4 i images (left) and 
(MUT) peritoneal mast cells induced by the basic secretagogues _ fluorescence traces (right) from wild-type (top) and Mrgprb2™V * (bottom) 
cultured peritoneal mast cells during application of atracurium (50 pg ml” '). 
e, Representative Fluo-4 images (left) and fluorescence traces (right) from 
(bottom) cultured peritoneal mast cells during application of wild-type (top) and Mrgprb2™ (bottom) cultured peritoneal mast cells 


icatibant (50 pg ml’). c, Example traces showing changes in [Ca?*],, as during application of ciprofloxacin (200 1g ml). 
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Extended Data Figure 10 | Human mast cells are activated by basic 
secretagogues and drugs associated with pseudo-allergic reactions in an 
MRGPRX2-dependent manner. a, Human LAD2 mast cells were treated with 
different concentrations of compound 48/80, mastoparan, icatibant, 
atracurium and ciprofloxacin. The activation of mast cells in response to these 
substances was characterized by the release of B-hexosaminidase, TNF, PGD 
and histamine. In addition, 0.1 pg ml! streptavidin stimulation of biotin- 
conjugated human IgE-sensitized LAD2 cells caused a robust release of 
B-hexosaminidase (71.3 + 1.8% release), compared with untreated cells 

(4.1 + 0.3% release). Group data are expressed as mean + s.e.m. b, Knockdown 
of human MRGPRX2 significantly reduced mast cell activation evoked by basic 
secretagogues and drugs associated with pseudo-allergic reactions, but not 


@ Control siRNA 


O MRGPRX2 siRNA 
#K 


Icatibant 


Atracurium 
Ciprofloxacin 


by IgE. Human LAD2 mast cells were first transfected with siRNA against 
MRGPRX2 or control siRNA. Two days after the transfection, the cells were 
treated with compound 48/80 (0.1 1g ml *); mastoparan (5 Lg ml’), icatibant 
(10 pg ml7), atracurium (25 ug ml ') and ciprofloxacin (75 pg ml !). The 
activation of mast cells in response to these substances characterized by the 
release of B-hexosaminidase was significantly reduced in MRGPRX2-siRNA- 
treated cells, compared to release in the control group. IgE-mediated mast cell 
degranulation was unaffected by MRGPRX2 siRNA knockdown. Group data 
are expressed as mean + s.e.m. Two-tailed unpaired Student’s t-test was used 
to determine significance in statistical comparisons, and differences were 
considered significant at *P < 0.05, **P < 0.01, ***P < 0.005 (the experiments 
were repeated three times). 
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Group 2 innate lymphoid cells promote beiging of 
white adipose tissue and limit obesity 


Jonathan R. Brestoff'”, Brian S. Kim*+, Steven A. Saenz"t, Rachel R. Stine*, Laurel A. Monticelli”, Gregory F. Sonnenberg’, 
Joseph J. Thome*®, Donna L. Farber**®, Kabirullah Lutfy’, Patrick Seale? & David Artis!” 


Obesity is an increasingly prevalent disease regulated by genetic and 
environmental factors. Emerging studies indicate that immune cells, 
including monocytes, granulocytes and lymphocytes, regulate meta- 
bolic homeostasis and are dysregulated in obesity'*. Group 2 innate 
lymphoid cells (ILC2s) can regulate adaptive immunity** and eosin- 
ophil and alternatively activated macrophage responses’, and were 
recently identified in murine white adipose tissue (WAT)° where they 
may act to limit the development of obesity®. However, ILC2s have 
not been identified in human adipose tissue, and the mechanisms by 
which ILC2s regulate metabolic homeostasis remain unknown. Here 
we identify ILC2s in human WAT and demonstrate that decreased 
ILC2 responses in WAT are a conserved characteristic of obesity in 
humans and mice. Interleukin (IL)-33 was found to be critical for 
the maintenance of ILC2s in WAT and in limiting adiposity in mice 
by increasing caloric expenditure. This was associated with recruit- 
ment of uncoupling protein 1 (UCP1)* beige adipocytes in WAT, 
a process known as beiging or browning that regulates caloric 
expenditure’ ’. IL-33-induced beiging was dependent on ILC2s, and 
IL-33 treatment or transfer of IL-33-elicited ILC2s was sufficient to 
drive beiging independently of the adaptive immune system, eosin- 
ophils or IL-4 receptor signalling. We found that ILC2s produce 
methionine-enkephalin peptides that can act directly on adipocytes 
to upregulate Ucp1 expression in vitro and that promote beiging in 
vivo. Collectively, these studies indicate that, in addition to respond- 
ing to infection or tissue damage, ILC2s can regulate adipose func- 
tion and metabolic homeostasis in part via production of enkephalin 
peptides that elicit beiging. 

Group 2 innate lymphoid cells (ILC2s) respond to the cytokine inter- 
leukin (IL)-33 (refs 3,10,11), and both IL-33 and ILC2s have been impli- 
cated in the regulation of metabolic homeostasis in mice’*'*. To address 
whether ILCs are present in human white adipose tissue (WAT) or 
dysregulated in obese patients, we obtained abdominal subcutaneous 
WAT from non-obese human donors and identified a lineage (Lin)- 
negative cell population that expresses CD25 (IL-2Ra) and CD127 
(IL-7Ra) (Fig. 1a, Extended Data Fig. 1a). This cell population expressed 
GATA binding protein 3 (GATA-3) and the IL-33 receptor (IL-33R) 
(Fig. 1b), consistent with ILC2s in other human tissues’?"*. A Lin” 
CD25* CD127* cell population that expresses GATA-3 and IL-33R 
was also identified in epididymal (E)-WAT of mice (Fig. 1c, d). These 
cells were developmentally dependent on inhibitor of DNA binding 
2 (Id2), transcription factor 7 (TCF-7) and the common gamma chain 
(y,) and produced the effector cytokines IL-5 and IL-13 (Extended Data 
Fig. 1b-e), similar to murine ILC2s as described previously**0"""*. 

We compared ILC2 frequencies in abdominal subcutaneous WAT 
from non-obese versus obese donors (Extended Data Table 1). WAT 
from obese donors exhibited decreased frequencies of ILC2s compared 


to non-obese controls (Fig. le, f). The obese group was enriched in older 
females compared to the non-obese group, but age and sex did not explain 
the difference in ILC2 frequencies between obese and non-obese donors 
(Extended Data Fig. 1f, g). To test whether ILC2s in WAT are also dys- 
regulated in murine obesity, mice were fed a control diet or high-fat diet 
(HFD). HFD-induced obese mice exhibited decreased frequencies and 
numbers of ILC2s in E-WAT compared to wild-type mice fed a control 
diet (Fig. 1g, h). Together, these data suggest that decreased ILC2 popula- 
tions in WAT isa conserved characteristic of obesity in mice and humans. 

We employed IL-33-deficient mice to test whether endogenous IL-33 
regulates ILC2 responses and the development of obesity. 1133 ’~ mice 
exhibited decreased basal frequencies and numbers of ILC2s in E-WAT 
and inguinal (i)W AT compared to 1133 */* controls (Fig. 2a—c, Extended 
Data Fig. 2a), and expression of IL-5 and IL-13 by WAT ILC2s was 
decreased in 11337’ mice compared to controls (Extended Data Fig. 2b). 
Notably, when fed a normal diet, mice lacking IL-33 gained more weight, 
accumulated more E-WAT and iWAT and had increased adipocyte size 
and whole-body adiposity compared to controls (Fig. 2d-f, Extended 
Data Fig. 2c). In addition, 1133’ mice exhibited dysregulated glucose 
homeostasis as evidenced by fasting euglycaemic hyperinsulinaemia, 
increased HOMA-IR index (homeostatic model assessment of insulin 
resistance) values and impaired glucose and insulin tolerance (Extended 
Data Fig. 2d-h). Together, these results indicate that endogenous IL-33 
is required to maintain normal ILC2 responses in WAT and to limit the 
development of spontaneous obesity. 

In contrast, wild-type mice treated with recombinant murine (rm)IL-33 
exhibited increased accumulation of ILC2s in E-WAT and iwAT 
(Fig. 2g-i). Although body weight did not differ between groups 
(Fig. 2j), mice treated with rmIL-33 had decreased adiposity and increased 
lean mass compared to controls (Fig. 2k). Remarkably, HFD-fed mice 
treated with rmIL-33 displayed increased E-WAT ILC2 numbers in 
association with decreased body weight and fat mass and improved glu- 
cose homeostasis compared to HFD-fed mice treated with PBS (Extended 
Data Fig. 3a—f). These beneficial metabolic effects are consistent with 
studies showing a protective role for IL-33 in obesity’? and may be related 
to obesity-associated pathologies such as atherosclerosis that are limited 
by IL-33"*, 

To examine the mechanisms by which IL-33 regulates adiposity 
we assessed energy homeostasis in control and rmIL-33-treated mice. 
Treatment of mice with rmIL-33 for 7 days resulted in increased caloric 
expenditure compared to controls (Fig. 21). Food intake was unchanged 
following chronic rmIL-33 treatment (Fig. 2m), and the absence of 
hyperphagia in the setting of increased caloric expenditure seemed to 
be related to decreased activity (Fig. 2n, Extended Data Fig. 4a). How- 
ever, rmIL-33 did not appear to have direct suppressive effects on food 
intake or activity levels (Extended Data Fig. 4b-d). These data suggest 
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that increased caloric expenditure following 7 days of rmIL-33 treat- 
ment could not be explained by the thermic effect of food or physical 
activity levels, but was regulated by other physiologic processes. 

An emerging cell type that is critical for regulating caloric expendi- 
ture is the beige adipocyte (also known as brite, brown-like or inducible 
brown adipocyte)”*””"*. These specialized adipocytes produce heat by 
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Figure 1 | Human and murine white adipose tissue contains group 2 innate 
lymphoid cells that are dysregulated in obesity. a, Identification of lineage 
(Lin)-negative CD25* CD127* innate lymphoid cells (ILCs) in human 
abdominal subcutaneous white adipose tissue (WAT) of a lean donor. 
Pre-gated on live CD45* Lin cells that lack CD3, CD5, TCRaB, CD19, CD56, 
CD11c, CD11b, CD16, and FceRI«. b, Histograms of GATA-3 and IL-33R 
expression by human WAT ILCs (line). Shaded histogram, isotype control. 

c, Identification of Lin” CD25* CD127* ILCs in murine epididymal 
(E)-WAT. Pre-gated on live CD45* Lin” cells that lack CD3, CD5, CD19, 
NK1.1, CD11c, CD11b and FceRIa. d, Histograms of GATA-3 and IL-33R 
expression by murine E-WAT ILCs (line). Shaded histogram, isotype control. 
e, Representative plots and f, frequencies of human WAT ILC2s from donors 
stratified into non-obese (body mass index (BMI) < 30.0 kg m °,n=7) 

and obese (BMI = 30.0kgm “, n = 7) groups. g, Representative plots and 
frequencies of murine E-WAT ILC2s from mice fed a control diet (CD, 

10% kcal fat, n = 5) or high-fat diet (HFD, 45% kcal fat, n = 4) for 12 weeks. 
h, Numbers of murine ILC2s per gram of E-WAT in mice fed a CD (n = 8) 
or HFD (n = 6) for 12 weeks. Student’s t-test, *P < 0.05, **P < 0.01, 

***P < 0.001. Data are shown as mean + standard error and are representative 
of 2-3 independent experiments. Sample sizes are biological replicates. 


uncoupling energy substrate oxidation from ATP synthesis”'”"*, a ther- 
mogenic process that expends calories and is dependent on uncoupling 
protein 1 (UCP1)*””. Previous work has linked brown and beige adipo- 
cyte function to the prevention of weight gain in mice and humans”? 
To test whether IL-33 regulates beiging, we examined WAT morphol- 
ogy of 1133*’* versus 1133 ‘~ mice. iWAT from I133*’* mice exhibited 
unilocular white adipocytes with interspersed paucilocular beige adi- 
pocytes that have multiple small lipid droplets and increased UCP1* 
cytoplasm (Fig. 3a). In contrast, iWAT from I 133-/~ mice had few beige 
adipocytes (Fig. 3b) and increased white adipocyte size compared to 
controls (Fig. 3a, b, Extended Data Fig. 2c). Expression of Ucp1 was also 
lower in iWAT of 11337” mice compared to controls (Fig. 3c), sug- 
gesting that IL-33 may bea critical regulator of beiging. Consistent with 
this, mice treated with rmIL-33 exhibited increased UCP1™ beige adi- 
pocytes and elevated expression of Ucp1 messenger RNA in E-WAT 
and iWAT (Fig. 3d—-f) compared to controls, indicating that IL-33 can 
promote beiging of WAT. Notably, the stimulatory effect of rmIL-33 
treatment on UCP1 expression was restricted to WAT and was not 
observed in brown adipose tissue (BAT) (Extended Data Fig. 5a-e). 
To test whether IL-33-elicited ILC2s promote beiging, congenic 
CD45.1* ILC2s from E-WAT of IL-33-treated donor mice were sort- 
purified and transferred into wild-type CD45.2* recipient mice (Extended 
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Figure 2 | IL-33 critically regulates ILC2 responses in white adipose tissue 
and limits adiposity. a-f, 1133°/* (n = 6) or 1133’ (n=5) mice were fed 
a control diet (10% kcal fat) for 12 weeks starting at 7 weeks of age. 

a, Frequencies and b, numbers of live CD45* Lin” CD25* IL-33R* ILC2s in 
epididymal (E)-WAT. Plots pre-gated on CD45* Lin™ cells that lack CD3, 
CD5, CD19, NK1.1, CD11c, CD11b and FceRI«. c, Numbers of ILC2s in 
inguinal (i)WAT. d, Body weight, first 10 weeks of feeding. e, Absolute and 
relative E-WAT and iWAT weights. f, Body composition. g—n, Wild-type mice 
were treated with phosphate buffered saline (PBS, n = 10) or recombinant 


Zeitgeber time 


murine IL-33 (12.5 ug per kg body weight per day, n = 12) by intraperitoneal 
injection for 7 days. g, Frequencies and h, numbers of ILC2s in E-WAT. 

i, Numbers of ILC2s in iWAT. j, Body weight and k, body composition. 

1, Caloric expenditure over a 24-h period, days 6-to-7 of treatment. Non-shaded 
area, lights on. Shaded area, lights off. m, Food intake and n, total activity (beam 
breaks) over the 24-h period in I. Student’s t-test or ANOVA with repeated 
measures. *P < 0.05, **P< 0.01, ***P< 0.001. Data are shown as 

mean + standard error and are representative of 2 independent experiments. 
Sample sizes are biological replicates. 
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Figure 3 | IL-33 and ILC2s contribute to beiging of white adipose tissue. 
a-c, 1133°/* (n = 6) or 1133’ (n=5) mice were fed a low-fat diet (10% kcal 
fat) for 12 weeks starting at age 7 weeks. Uncoupling protein 1 (UCP1) 
immunohistochemistry (IHC) in iWAT from a, 1133*’* or b, 1133’ mice. 
Scale bars, 100 um. c, Ucp1 transcript levels in iWAT. d-f, Wild-type mice 
were treated with PBS or recombinant murine IL-33 (12.5 pg per kg body 
weight per day) by intraperitoneal injection for 7 days. d, E-WAT and e, iWAT 
UCP1 IHC. Scale bars, 100 um. f, Ucp1 transcript levels in E-WAT and iWAT. 
g-k, Sort-purified CD45.1* ILC2s (10°) from E-WAT of IL-33-treated 
mice were transferred into 12-week-old CD45.2* wild-type recipients by 
subcutaneous and intraperitoneal injection daily for 4 days (PBS, n = 8; ILC2, 
n= 8 except panel k). g, Representative plots identifying donor and recipient 
ILC2s. Plots pre-gated on live CD45* Lin” CD25* IL33R* cells. Lineage 
cocktail: CD3, CD5, CD19, NK1.1, CD11c, CD11b and FceRI«. h, Total 
numbers of ILC2s per gram iWAT. i, iWAT UCP1 IHC. Scale bars, 100 tum. 
j, Ucp1 expression in iWAT. k, iWAT oxygen consumption. PBS, n = 14; ILC2, 
n= 15.1, m, Sort-purified congenic CD45.1* ILC2s (X 10°) from E-WAT 

of IL-33-treated mice were transferred into Rag2’~ yc’ mice once by 
intraperitoneal injection. ILC2-sufficient Rag2 ’~ mice, ILC2-deficient 
Rag2 ’~ ye /~ mice and ILC2-reconstituted Rag2 ‘~ yc ’~ mice were treated 
with PBS or recombinant murine IL-33 (12.5 1g per kg body weight per day) by 
intraperitoneal injection for 7 days (n = 4 mice per group). 1, ILC2 numbers 
per gram E-WAT. m, Ucp1 expression in E-WAT. Student's t-test or two-way 
ANOVA. *P< 0.05, **P< 0.01, ***P < 0.001. Data are shown as 

mean + standard error and are representative of 2-4 independent experiments. 
Sample sizes are biological replicates. 


Data Fig. 6a). CD45.1~ donor ILC2s could be identified in iWAT 
(Fig. 3g) and E-WAT (Extended Data Fig. 6b) of mice that received 
ILC2s but not in control mice that received PBS, and total ILC2 num- 
bers were significantly increased in iWAT of mice receiving CD45.1° 
ILC2s compared to controls (Fig. 3h). Transferred ILC2s could not be 
identified in BAT, mesenteric lymph nodes or lung (Extended Data 
Fig. 6b), indicating selective accumulation of WAT-derived ILC2s in 
WAT ofrecipient mice. Transfer of ILC2s was associated with increased 
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UCP1* beige adipocytes, augmented expression of Ucp1 and elevated 
oxygen consumption in iWAT (Fig. 3i-k). 

To test whether IL-33 promotes beiging of WAT in an ILC2-dependent 
manner, we treated ILC2-deficient Rag2 te yo / ~ mice with IL-33 in the 
presence or absence of adoptively transferred congenic ILC2s (Extended 
Data Fig. 6c). Rag2 /~ ye‘ (ycisalso knownas I 12rg) mice supported 
accumulation and IL-33-induced population expansion of transferred 
E-WAT-derived ILC2s in host E-WAT (Fig. 31, Extended Data Fig. 6d). 
IL-33 treatment increased expression of Ucp1 in E-WAT of ILC2-suf- 
ficient Rag2-’~ controls but not ILC2-deficient Rag2~’~ yc" mice 
(Fig. 3m). Strikingly, rmIL-33-induced increases in expression of Ucp1 
and beiging were restored in ILC2-reconstituted Rag2’~ yc ’~ mice 
(Fig. 3m, Extended Data Fig. 6e). Collectively, these results indicate 
that IL-33-induced beiging of WAT requires a y.-dependent cell popu- 
lation and that ILC2s are sufficient to rescue this defect, suggesting that 
IL-33-induced beiging is critically dependent on ILC2s. 

ILC2s have been shown to promote the eosinophil/IL-4Ra/alternatively- 
activated macrophage (AAMac) pathway that can elicit beiging through 
IL-4Ra-dependent production of noradrenaline by AAMacs*??,. In 
addition, regulatory T (T,.g) cells in WAT are known to be critical for 
regulating glucose homeostasis in mice” and are increased following 
rmIL-33 treatment (Extended Data Fig. 3g, h). Therefore, we sought 
to test whether the IL-33/ILC2 pathway could promote beiging in the 
absence of eosinophils, IL-4Ro or the adaptive immune system. Remark- 
ably, delivery of rmIL-33 to DblGata1 (also known as Gatal'”°"", 
eosinophil-deficient), [/4ra~’~ or Rag2‘~ mice elicited beiging of 
WAT (Fig. 3m, Extended Data Fig. 7a—f), and transfer of IL-33-elicited 
ILC2s to DbIGatal, [l4ra~/~ or Ragl ~~ mice resulted in accumula- 
tion of ILC2s in iWAT and recruitment of UCP1~ beige adipocytes 
(Extended Data Fig. 7g-l). Therefore, although eosinophils, AAMacs 
and adaptive immune cells may contribute to optimal beiging under 
some physiologic settings, these data indicate that the IL33/ILC2 axis 
can promote beiging independently of the eosinophil/IL-4Ro/AAMac 
pathway and the adaptive immune system. 

Obesity is associated with both decreased ILC2s (Fig. 1) and defective 
beige adipocytes”’”’. To address whether ILC2s produce factors that 
could directly regulate beiging, we employed genome-wide transcrip- 
tional profiling of ILC2s versus group 3 ILCs (ILC3s) to compare gene 
expression enrichment scores of 69 genes previously linked to human 
obesity (Extended Data Table 2)**’’. This analysis identified one gene, 
proprotein convertase subtilisin/kexin type 1 (Pcsk1) (also known as 
prohormone convertase 1, PC1), to be significantly enriched in ILC2s 
but not ILC3s (Fig. 4a, P< 0.01). PCSK1 is an endopeptidase involved in 
processing some prohormones into active forms”, and loss-of-function 
mutations in both mice and humans are associated with increased sus- 
ceptibility to obesity and decreased caloric expenditure”. The most 
differentially expressed PCSK1 target in ILC2s versus ILC3s was pro- 
enkephalin A (Penk) (Fig. 4b), which encodes endogenous opioid-like 
peptides such as methionine-enkephalin (MetEnk). Production of MetEnk 
by ILC2s was confirmed by flow cytometric analysis of sort-purified 
ILC2s (Fig. 4c). Following IL-33 stimulation, ILC2 production of MetEnk 
peptides was increased (Fig. 4d). In vivo delivery of MetEnk peptides 
into wild-type mice elicited UCP1 * beige adipocytes, upregulated expres- 
sion of Ucp1 and increased oxygen consumption in iWAT (Fig. 4e-g), 
indicating the formation of functional beige fat. Consistent with this, 
MetEnk treatment decreased iWAT mass (Fig. 4h). These changes were 
not associated with increased expression of I/4 or [113 (Fig. 4i) or altered 
eosinophil or AAMac numbers in iWAT (Fig. 4j). 

Gene expression analyses in wild-type mice at steady state indicated 
that 1133 and Penk expression levels were increased in iWAT compared 
to BAT (Fig. 4k). In addition, expression of the MetEnk receptor 61 
opioid receptor (Oprd1) was higher in iWAT compared to BAT, whereas 
expression of the other known MetEnk receptor Opioid growth factor 
receptor (Ogfr) was lower in iWAT compared to BAT (Fig. 41), sug- 
gesting that there may be tissue-specific effects of MetEnk in iWAT 
compared to BAT. Consistent with this, MetEnk stimulation induced 
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metabolic homeostasis by eliciting beiging of white adipose tissue. Pro- 
duction of enkephalin peptides is a previously unrecognized effector 
mechanism employed by ILC2s to regulate metabolic homeostasis. From 
an evolutionary perspective, coupling ILC2-dependent innate immune 
effector functions with the maintenance of systemic metabolic home- 
ostasis could provide a rapid, integrated multi-organ response that allows 
mammals to surmount multiple environmental challenges including 
infection, nutrient stress or changes in temperature. Given that impaired 
beige adipocyte function is associated with increased weight gain and 
obesity in mice”? and that activity of brown/beige’’” adipose tissue is 
dysregulated in obese patients’, targeting the IL-33/ILC2/beiging 
pathway could represent a new approach for treating obesity and obesity- 
associated diseases. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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Figure 4 | ILC2s produce methionine-enkephalin, a peptide that promotes 
beige fat formation. a, Gene expression enrichment analyses of 69 
obesity-associated genes in ILC2s (x axis, n = 4) versus ILC3s (y axis, n = 4). 
Genes significantly enriched in one cell type but not the other are red. 

b, Differential expression of PCSK1 target genes in ILC2s versus ILC3s. 

c, Intracellular staining of MetEnk (black line) or rabbit IgG isotype control 
(shaded histogram) in ILC2s sort-purified from E-WAT and re-stimulated 

in vitro with IL-2 and IL-7 (10 ng ml’) for 4 days. d, MetEnk mean 
fluorescence intensity (MFI) in sort-purified E-WAT ILC2s re-stimulated 

in vitro with IL-2 and IL-7 (10 ng ml’) with or without IL-33 (30 ng ml ') for 
4 days. Isotype control MFI for each group was subtracted before calculating 
relative expression. Shown are averages from 4 independent experiments, 
each representing pooled cells from n = 3-5 mice and measured in duplicate or 
triplicate. e-j, Wild-type mice were treated with PBS (n = 7) or MetEnk (n = 9) 
by subcutaneous injection (10 mg per kg body weight per day) for 5 days. 

e, Uncoupling protein 1 (UCP1) immunohistochemistry (IHC) in inguinal 
white adipose tissue ({WAT). Scale bars, 100 um. f, iWAT Ucp1 expression; 
g, iWAT oxygen consumption; h, iWAT relative mass; i, iWAT I/4 and 113 
expression and j, numbers of eosinophils (Eos, live CD45* SiglecF* SSC) and 
alternatively activated macrophages (AAMacs, live CD45* SiglecF~ F4/80* 
CD206") per gram of WAT. k, 1/33 and Penk mRNA and I, Ogfr and Oprd1 
mRNA in iWAT versus brown adipose tissue (BAT), n = 8. m, n, Stromal 
vascular fraction (SVF) cells from m, iWAT or n, BAT of 4-week-old C57BL/6 
mice were differentiated into primary adipocytes for 2 days, treated with PBS or 
50 uM MetEnk from days 2-8 and harvested on day 8 (i{WAT: n = 7 PBS, n = 8 
MetEnk; BAT: n = 6 PBS, n = 6 MetEnk). Student’s t-test or ANOVA, 

*P< 0.05, ***P< 0.001. Data are shown as mean + standard error and 

are representative of 2-3 independent experiments. Sample sizes are 
biological replicates. 


Ucp1 expression in cultured primary adipocytes from iWAT (Fig. 4m) 
but not BAT (Fig. 4n). Taken together, these results identify that ILC2s 
express MetEnk that can directly promote beiging of WAT (Extended 
Data Fig. 8). 

To our knowledge, these data collectively provide the first demonstra- 
tion that dysregulated ILC2 responses in WAT are a conserved feature 
of obesity in humans and mice and that the IL-33/ILC2 axis regulates 


innate lymphoid type 2 and type Il NKT cells that regulate obesity in mice. 
J. Immunol. 191, 5349-5353 (2013). 
7. Harms, M. & Seale, P. Brown and beige fat: development, function and therapeutic 
potential. Nature Med. 19, 1252-1263 (2013). 
8. Shabalina, |. G., et al. UCP1 in brite/beige adipose tissue mitochondria is 
functionally thermogenic. Cel! Rep. 5, 1196-1203 (2013). 
9. Cohen, P. et al. Ablation of PRDM16 and beige adipose causes metabolic 
dysfunction and a subcutaneous to visceral fat switch. Cel/ 156, 304-316 (2014). 
0. Price, A. E. et al. Systemically dispersed innate IL-13-expressing cells in type 2 
immunity. Proc. Nat! Acad. Sci. USA 107, 11489-11494 (2010). 
1. Neill, D. R. etal. Nuocytes represent a new innate effector leukocyte that mediates 
type-2 immunity. Nature 464, 1367-1370 (2010). 
2. Miller, A. M. et al. Interleukin-33 induces protective effects in adipose tissue 
inflammation during obesity in mice. Circ. Res. 107, 650-658 (2010). 
3. Mjosberg, J. M. et al. Human IL-25- and IL-33-responsive type 2 innate lymphoid 
cells are defined by expression of CRTH2 and CD161. Nature Immunol. 12, 
1055-1062 (2011). 
4. Monticelli, L.A. et al. Innate lymphoid cells promote lung-tissue homeostasis after 
infection with influenza virus. Nature Immunol. 12, 1045-1054 (2011). 
5. Yang, Q. etal. T cell factor 1 is required for group 2 innate lymphoid cell generation. 
Immunity 38, 694-704 (2013). 
6. Miller, A. M. et al. IL-33 reduces the development of atherosclerosis. J. Exp. Med. 
205, 339-346 (2008). 

7. Wu, J. et al. Beige adipocytes are a distinct type of thermogenic fat cell in mouse 
and human. Cel! 150, 366-376 (2012). 

8. Rosen, E. D. & Spiegelman, B. M. What we talk about when we talk about fat. Cel/ 
156, 20-44 (2014). 

9. Feldmann, H. M., Golozoubova, V., Cannon, B. & Nedergaard, J. UCP1 ablation 
induces obesity and abolishes diet-induced thermogenesis in mice exempt from 
thermal stress by living at thermoneutrality. Cel/ Metab. 9, 203-209 (2009). 

. Carey, A. L. etal. Ephedrine activates brown adipose tissue in lean but not obese 
humans. Diabetologia 56, 147-155 (2013). 

. Saito, M. et a/. High incidence of metabolically active brown adipose tissue in 
healthy adult humans: effects of cold exposure and adiposity. Diabetes 58, 
1526-1531 (2009). 

. Qiu, Y. etal. Eosinophils and type 2 cytokine signaling in macrophages orchestrate 
development of functional beige fat. Ce// 157, 1292-1308 (2014). 

. Wu, D. et al. Eosinophils sustain adipose alternatively activated macrophages 
associated with glucose homeostasis. Science 332, 243-247 (2011). 

. Liu, P.-S. etal. Reducing RIP140 expression in macrophage alters ATM infiltration, 
facilitates white adipose tissue browning, and prevents high-fat diet-induced 
insulin resistance. Diabetes 63, 4021-4031 (2014). 

. Feuerer, M. et a/. Lean, but not obese, fat is enriched for a unique population of 

regulatory T cells that affect metabolic parameters. Nature Med. 15, 930-939 

(2009). 

McCarthy, M. |. Genomics, type 2 diabetes, and obesity. N. Engl. J. Med. 363, 

2339-2350 (2010). 


26. 


12 MARCH 2015 | VOL 519 | NATURE | 245 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


27. Walley, A. J., Asher, J. E. & Froguel, P. The genetic contribution to non-syndromic 
human obesity. Nature Rev. Genet. 10, 431-442 (2009). 

28. Seidah, N. G., Sadr, M. S., Chretien, M. & Mbikay, M. The multifaceted proprotein 
convertases: their unique, redundant, complementary, and opposite functions. 
J. Biol. Chem. 288, 21473-21481 (2013). 

29. Lloyd, D.J., Bohan, S. & Gekakis, N. Obesity, hyperphagia and increased metabolic 
efficiency in Pcl mutant mice. Hum. Mol. Genet. 15, 1884-1893 (2006). 

30. Sharp, L. Z. etal. Human BAT possesses molecular signatures that resemble 
beige/brite cells. PLoS ONE 7, e49452 (2012). 


Acknowledgements The authors wish to thank members of the Artis laboratory for the 
critical reading of this manuscript. Research in the Artis laboratory is supported by the 
National Institutes of Health (Al061570, Al074878, Al095466, Al095608, Al102942, 
and Al097333 to D.A.), the Burroughs Wellcome Fund Investigator in Pathogenesis of 
Infectious Disease Award (D.A.) and Crohn’s & Colitis Foundation of America (D.A.). 
Additional funding was provided by NIH F30-Al112023 (VJ.R.B.), T32-Al060516 (J.R.B.), 
T32-Al007532 (LA.M.), KL2-RRO24132 (B.S.K.), DP50D012116 (G-F.S.), PO1AI06697 
(D.LF.), F31AG047003 (J.J.T.) and DP20D007288 (P.S.) and by the Searle Scholars 
Award (P.S.). We thank M. A. Lazar for scientific and technical advice, D. E. Smith for 
providing !/33~”— mice, A. Goldrath for providing Id2~’~ chimaeras, and A. Bhandoola 
for providing Tcf7-’~ mice. We also thank the Mouse Phenotyping, Physiology & 
Metabolism Core at the Diabetes Research Center (DRC) of the Institute for Diabetes, 
Obesity & Metabolism (IDOM) as well as the Penn Diabetes Endocrine Research Center 
Grant (P30DK19525). In addition, we thank the Matthew J. Ryan Veterinary Hospital 


246 | NATURE | VOL 519 | 12 MARCH 2015 


Pathology Laboratory, the Penn Microarray Facility, and the Mucosal Immunology 
Studies Team (MIST) of the NIH NIAID for shared expertise and resources. The authors 
would also like to thank the Abramson Cancer Center Flow Cytometry and Cell Sorting 
Resource Laboratory for technical advice and support. The ACC Flow Cytometry and 
Cell Sorting Shared Resource is partially supported by NCI Comprehensive Cancer 
Center Support Grant (no. 2-P30 CA016520). This work was supported by the 
NIH/NIDDK P30 Center for Molecular Studies in Digestive and Liver Diseases 
(P30-DK050306), its pilot grant program and scientific core facilities (Molecular 
Pathology and Imaging, Molecular Biology, Cell Culture and Mouse), as well as the Joint 
CHOP-Penn Center in Digestive, Liver and Pancreatic Medicine and its pilot grant 
program. In addition, we would like to acknowledge and thank the New York Organ 
Donor Network, the Cooperative Human Tissue Network-Eastern Division and 
especially the donors and their families. We apologize to colleagues whose work we 
were unable to quote owing to space constraints. 


Author Contributions J.R.B., B.S.K.,, SAS. RRS. LAM. G.F.S., KL, P.S. and DA. 
designed and performed the research and/or provided advice and technical expertise. 
J.J.T.and D.LF. provided human tissues. J.R.B. and D.A. analysed the data and wrote the 
manuscript. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of the paper. Correspondence 
and requests for materials should be addressed to D.A. (dartis@med.cornell.edu). 


©2015 Macmillan Publishers Limited. All rights reserved 


METHODS 

Mice. C57BL/6, CD45.1* C57BL/6, Ragl ~~ and DbiGata1 (Balb/c background) 
mice were obtained from Jackson Labs. Rag?’ in Rag2/ i yo’ ag 1133*/ * Balb/c 
and Il4ra ‘~ (Balb/c background) mice were obtained from Taconic. I 133-/~ mice 
were provided by Amgen Inc. via Taconic. Id2 ’~ bone marrow chimaeras'* and 
Tcf7 ‘~ mice’? were generated as described previously. Unless otherwise noted, all 
mice were on a C57BL/6 background. All mice were males and had ad libitum access 
to food and water and were maintained in a specific-pathogen free facility with a 
12 h:12h light:dark cycle. Animals were randomly assigned to groups of n = 3-5 
mice per group per experiment, and at least two independent experiments were 
performed throughout. In all in vivo experiments, a single technical replicate per 
mouse was performed except in glucose homeostasis tests described below, in 
which 2-4 technical replicates were performed per mouse for each time point. For 
all mRNA analyses, biological replicates were measured in duplicate or triplicate. 
For all in vitro experiments, 2-3 technical replicates were performed in each inde- 
pendent experiment. Sample sizes in each independent experiment were selected 
to have power of at least 90% using published sample size/power formulas*'. Studies 
were not blinded. All experiments were carried out under the guidelines of the 
Institutional Animal Care and Use Committee at the University of Pennsylvania. 
Human samples. Subcutaneous white adipose tissue (S- WAT) from the abdominal 
region was obtained from human donors via the New York Human Organ Donor 
Network (NYODN) and via the Cooperative Human Tissue Network (CHTN) 
Eastern Division, University of Pennsylvania. Donor characteristics are summa- 
rized in Extended Data Table 1. NYODN samples were from recently deceased 
organ donors at the time of organ acquisition for clinical transplantation through 
an approved research protocol and MTA with the NYODN. All NYODN donors 
were free of cancer and were hepatitis B, hepatitis C and human immunodefi- 
ciency virus-negative. Tissues were collected after the donor organs were flushed 
with cold preservation solution and clinical procurement process was completed. 
Samples from CHTN were collected from non-deceased adults undergoing pan- 
niculectomies, and were harvested from discarded connective tissue by CHTN staff. 
All human samples from NYODN and CHTN were stored in DMEM on ice or at 
4 °C for 24-48 h before processing. Donors were defined as non-obese if their body 
mass index (BMI) was <30.0 kg m *(n=7) or obese if their BMI was = 30.0 kg m”? 
(n = 7). Sample sizes per group were selected to have power >95% using pub- 
lished sample size/power formulas*'. There were no differences in the proportion of 
donors from NYODN or CHTN between non-obese and obese groups (Extended 
Data Table 1). ILC2 frequencies were also compared for all characteristics shown 
in Extended Data Table 1, and those characteristics that had a P value < 0.10 were 
interrogated to test whether they could explain the differences in ILC2 frequencies 
observed between non-obese versus obese donors. The human samples from NYODN 
do not qualify as ‘human subjects’ research, as confirmed by the Columbia Uni- 
versity IRB, and the human samples from CHTN were de-identified and were not 
obtained for the specific purpose of these studies and therefore are not considered 
‘human subjects’ research. 

Diet-induced obesity. Where indicated, mice were fed a control diet (CD, 10% kcal 
fat, Research Diets, New Brunswick, New Jersey) or high fat diet (HFD, 45% or 
60% kcal fat as indicated, Research Diets) for the indicated period of time start- 
ing at 6-8 weeks of age. CD and HFD were gamma-irradiated (10-20 kGy). In all 
experiments that did not employ HED or CD, mice were fed a standard autocla- 
vable rodent chow (5% kcal fat, 5010, Lab Diets, St. Louis, Missouri). 

In vivo cytokine and enkephalin peptide treatments. Mice were administered 
12.5 1g per kg body weight carrier-free recombinant murine IL-33 (rmIL-33, R&D 
Systems, Minneapolis, Minnesota) in sterile phosphate buffered saline (PBS) by 
intraperitoneal (i.p.) injection for 7 days at the indicated dose. In HED studies, mice 
were treated with 12.5 ug per kg body weight recombinant murine IL-33 or PBS 
once every 4 days by i.p. injection. In some studies, mice were treated with a pre- 
viously reported’? dose of 10 mg per kg body weight [Met*]-enkephalin acetate salt 
hydrate (MetEnk, amino acid sequence YGGEM, = 95.0% purity by HPLC, Sigma 
Aldrich, St. Louis, MO) in PBS or with PBS alone by bilateral subcutaneous injec- 
tion near the iWAT daily for 5 days (approximately 200 pl per side). MetEnk or 
vehicle injections were performed under isoflurane anaesthesia. 
Sort-purification and transfer of ILC2s. E-WAT was harvested from male CD45.1* 
C57BL/6 mice that received daily injections of rmIL-33 (12.5 tg per kg body weight) 
for 7 days by intraperitoneal injection. Live CD45* Lin~ CD25* IL-33R* ILC2s 
were sort-purified using an Aria Cell Sorter (BD) to = 98% purity, and 10° ILC2s 
were immediately transferred to the indicated recipient mice by intraperitoneal 
injection (5 X 10° cells) and by subcutaneous injection near iWAT (5 X 10* cells 
split evenly for bilateral injections). Daily transfers were performed for 4 consec- 
utive days, and tissues were harvested on day 5. In ILC2 reconstitution experiments 
involving Rag2-’~ yc /~ recipient mice, 10° ILC2s were transferred by a single 
intraperitoneal injection, and the next day mice were treated with PBS or rmIL-33 
(12.5 pg per kg body weight) by daily intraperitoneal injection for 7 days. 
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In vivo metabolic phenotyping. Mice were single-housed in an OxyMax Com- 
prehensive Laboratory Animal Monitoring System (CLAMS, Columbus Instruments, 
Columbus, Ohio) for 24h. Mice were acclimated to the CLAMS cages for 24 h before 
measurements commenced. Fat mass and adiposity were measured by 'H-nuclear 
magnetic resonance (NMR) spectroscopy. For glucose tolerance tests, mice were 
fasted overnight for 14-16 h and injected with 2 g per kg body weight D-glucose by 
ip. injection. Blood glucose values were measured just before injection (time 0) and 
at 20, 40, 60, 90 and 120 min post-injection. For insulin tolerance tests, mice were 
fasted for 4-6 h and then injected with bovine insulin (0.5 U per kg body weight). 
Blood glucose values were measured just before injection (time 0) and at 20, 40 and 
60 min post-injection. To measure fasting blood glucose and insulin concentrations, 
mice were fasted overnight for 14-16 h, and blood glucose values were measured 
followed by collection of approximately 20-30 ul blood for serum insulin concentra- 
tion determination using the Ultra Sensitive Mouse Insulin ELISA Kit (Crystal Chem). 
Homeostatic model assessment of insulin resistance (HOMA-IR) index values were 
calculated as described previously**. All blood glucose measurements were per- 
formed using FreeStyle Lite handheld glucometer (Abbot) in duplicate or triplicate. 
Histologic analysis. Tissues were fixed in 4% paraformaldehyde in PBS for at least 
48 h at 4 °C and embedded in paraffin before cutting 5-um sections and staining 
with haematoxylin and eosin (H&E) or performing immunohistochemistry (IHC) 
with rabbit anti- UCP1 antibody (Abcam, ab10983). For IHC, rehydrated sections 
were microwaved in 10 mM citric acid buffer (pH 6.0) for antigen retrieval, and 
endogenous peroxidases were quenched with 3% hydrogen peroxide. Sections were 
blocked with Avidin D, biotin and protein blocking agent in sequential order fol- 
lowed by application of the anti- UCP 1 antibody (1:500). A biotinylated anti-rabbit 
antibody was used as a secondary antibody. Horseradish peroxidase-conjugated ABC 
reagent was applied, and then DAB reagent was used to develop the signal before 
counterstaining in haematoxylin and dehydrating the sections in preparation for 
mounting. Stained sections were visualized and photographed using a Nikon E600 
bright field microscope. 

Adipocyte area quantification. Inguinal white adipose tissue (i{WAT) sections 
were H&E stained and imaged at X40 magnification. White adipocyte area was 
calculated using ImageJ software by drawing ellipses circumscribing white adipo- 
cytes. The scale was set to 8 pixels per tum based on the pixel length ofa 100-|um scale 
bar at X40 magnification. Two to three images, each from a different area of a given 
sample, were captured per animal. Adipocyte area was measured in 10-20 adipo- 
cytes per image (25-40 adipocytes per mouse) and averaged on a per-mouse basis. 
Isolation of immune cells and flow cytometry. Murine epididymal white adipose 
tissue (E-WAT), inguinal WAT (iWAT) or brown adipose tissue (BAT) or human 
subcutaneous abdominal WAT were harvested and digested with 0.1% collagenase 
type II (Sigma-Aldrich, USA) at 37 °C with shaking at 200 r.p.m. for 60-90 min. 
Digested tissues were filtered through a 70-jum nylon mesh and centrifuged at 500g 
for 5 min. Floating adipocytes were removed, and the stromal vascular fraction 
(SVF) pellet was resuspended in red blood cell lysis buffer (ACK RBC Lysis Buffer). 
Recovered cells were washed and stained with live/dead stain (Molecular Probes) 
followed by standard surface staining for flow cytometric analysis with fluorochrome- 
conjugated antibodies. Murine cells were stained with combinations of the follow- 
ing antibodies: anti-mouse CD45-eFluor 605NC (clone 30-F11), CD45.1-eFluor 
450 (A20), CD45.2-AlexaFluor 700 (104), F4/80-eFluor 450 (BM8), CD3e-PerCP- 
Cy5.5 (145-2C11), CD5-PerCP-Cy5.5 (53-7.3), CD19-PerCP-Cy5.5 (1D3), NK1.1- 
PerCP-Cy5.5 (PK136), CD11c-PerCP-Cy5.5 (N418), FceRIo-FITC (MAR-1), 
Foxp3-FITC (FJK-16 s), GATA-3-PE (TWAJ) and CD25-PE-Cy7 (clone PC61.5) 
from eBioscience (San Diego, CA); CD11b-PE-Texas Red (M1/70.15) from Life 
Technologies (Grand Island, NY); CD90.2-Alexa Fluor 700 (30-H12) and CD4- 
Brilliant Violet-650 (RM4-5) from BioLegend (San Diego, CA); SiglecF-PE (E50- 
2440) and CD3e-PE-CF594 (145-2C11) from BD Biosciences (San Jose, CA); 
IL-33R-biotin (T1/S2, clone DJ8) from MD Bioproducts (St. Paul, MN); CD206- 
Alexa Fluor 647 (MR5D3) from AbD Serotec (Raleigh, NC); and Streptavidin- APC 
from eBioscience. Foxp3, GATA-3 and CD206 staining was performed following 
fixation and permeabilization with the Foxp3 Staining Buffer Set (eBioscience). 
Human cells were stained with anti-human GATA-3-PE (TWAJ), TCRaB-PerCP- 
Cy5.5 (IP26), CD5-PerCP-Cyanin5.5 (L17F12), CD19-Alexa Fluor 700 (HIB19), 
CD11c-Alexa Fluor 700 (3.9), CD127-eFluor 780 (eBioRDR5), CD45-eFluor 605NC 
(HI30), FceRIa-biotin (AER-37) and Streptavidin-eFluor 650NC from eBioscience; 
CD56-Alexa Fluor 700 (B159), CD16 (3G8), CD3 (SP34-2) and CD25-PE-Cy7 
(M-A251) from BD Pharmingen; CD11b-PE-Texas Red (M1/70.15) from Life 
Technologies; and ST2L-FITC from MD Bioproducts. Stained cells were acquired 
ona BD LSRII flow cytometer (BD Biosciences), and data were analysed using FlowJo 
software version 9.6.4 (Tree Star, Inc.). 

Intracellular cytokine analysis. To examine ILC2 effector cytokine production, 
single-cell suspensions of E-WAT or iWAT SVF were stimulated for 4h ex vivo with 
phorbol 12-myristate 13-acetate (PMA) (100 ng ml’) and ionomycin (1 ng ml~})in 
the presence of brefeldin A (10 jig ml~ ) (all from Sigma-Aldrich) in a 37 °C incubator 
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(5% CO). Cells were then surface-stained, fixed and permeabilized using Cyto 
Fix/Perm (BD Pharmingen) according to manufacturer’s instructions before intra- 
cellular staining for IL-5 (APC-IL-5, clone TRFK5, eBioscience) and IL-13 (PE-IL-13, 
eBio13A, eBioscience). Monensin (1:1500) was also used for intracellular staining 
with rabbit anti-mouse MetEnk (bs-1759R, Bioss USA, Woburn, MA) or rabbit anti- 
mouse IgG (Isotype control, Bioss USA) followed by staining with goat anti-rabbit 
PE (sc-3739, Santa Cruz Biotechnology, Dallas, TX). 

Real-time PCR. Adipose tissues were snap-frozen in TRIzol (Invitrogen) and homog- 
enized using a Tissue Lyser (Qiagen). RNA was isolated from the aqueous phase 
using the RNeasy kit (Qiagen) in accordance with the manufacturer’s instructions. 
cDNA was synthesized from 1.0 .g RNA using Superscript II Reverse Transcrip- 
tase (Invitrogen) and oligo(dT) (Invitrogen). Real-time PCR was performed using 
SYBR Green technology (Applied Biosystems) with previously published primer 
sequences for murine Ucp1"’ and Qiagen QuantiTect real-time PCR primers for 
B-actin, 1133, Penk, Oprd1 and Ogfr. Reactions were run on the 7500 Fast Real-Time 
PCR System (Applied Biosystems) or the QuantStudio 6 Flex Real-Time PCR 
System (Applied Biosystems). Results were normalized to the housekeeping gene 
B-actin, and the AAC, method was employed for all real-time PCR analyses. 
Microarrays and ILC2 versus ILC3 gene enrichment analyses. Microarray ana- 
lyses (~25,000 genes) were performed using previously published microarray data 
sets (GEO GSE46468)"“. In brief, Lin” CD90* CD25* IL-33R™ ILC2s from the lung 
(4 biological replicates each comprising 6 pooled lungs) and Lin” CD90* CD25* 
CD4* ILC3s from the spleen (4 biological replicates each comprising 10 pooled 
spleens) were sorted using a FACS Aria (BD) directly into TRIzol LS (Invitrogen) 
at a purity of >97% (1.5-2.0 X 10* cells per replicate). mRNA was isolated, amp- 
lified, labelled and hybridized to Affymetrix GeneChip (Mouse Gene 1.0 ST) as 
described previously'*. Gene expression Z-scores were calculated for each of 69 
obesity-associated genes in ILC2s or ILC3s (see Extended Data Table 2 for a com- 
plete list of genes). Genes that were significantly enriched compared to the aver- 
age gene expression level of the entire microarray data set) in one cell population 
(Z > 2.20) but not the other were considered to be differentially enriched in that cell 
population. Bonferroni correction (alpha = 0.05, k = 69) was applied for microarray 
analyses to account for multiple testing. 

Tissue oxygen consumption. A ~20 mg biopsy of iWAT was isolated from directly 
below the lymph node and minced in PBS containing 2% BSA, 1.1mM sodium 


pyruvate and 25 mM glucose. Samples were placed in an MT200A Respirometer 
Cell (Strathkelvin), and oxygen consumption was measured for approximately 
5 min. Oxygen consumption rates were normalized to minced tissue weight. 
Primary adipocyte culture. iWAT or BAT was dissected from 4 week-old C57BL/ 
6 mice (n = 5 per experiment, pooled) and digested as described above. Stromal 
vascular fraction (SVF) cells were plated in 12-well CellBind plates, and adherent 
cells were grown to confluence. Cells were differentiated into adipocytes as previ- 
ously described™. Briefly, cells were cultured for 2 days with 850 nM insulin, 1 nM 
3,3’ ,5-triiodo-L-thyronine (T3), 1 [1M rosiglitazone, 125 nM indomethacin (125 uM 
for BAT primary adipocytes), 0.5 mM isobutylmethylxanthine (IBMX) and 1 1M 
dexamethasone in adipocyte culture media (DMEM:F12 [50:50] supplemented 
with 10% heat-inactivated FBS, penicillin, streptomycin and L-glutamine). Cells 
were then maintained in adipocyte culture media supplemented with 850 nM insulin 
and 1 nM T; with either PBS or 50 1M MetEnk for 6 days, with fresh media replace- 
ment every 2 days. Cells were harvested on day 8 in TRIzol. 

Statistical analyses. Data are expressed as mean + standard error of the mean 
(s.e.m.). Statistical significance was determined for normally-distributed data by 
using the two-tailed Student’s f test or a one-way or two-way analysis of variance 
(ANOVA) followed by Sidak or Tukey post-hoc tests. If variance differed between 
groups, the appropriate statistical correction was applied (for example, Welch’s cor- 
rection). Correlation analyses were conducted using Pearson linear regression. Pro- 
portions among human samples were compared by Chi-squared tests. Significance 
was set at P< 0.05. Statistical analyses were performed with Prism 6 (GraphPad 
Software, Inc.) or SPSS Statistics version 22 (IBM). 
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Extended Data Figure 1 | Identification of human innate lymphoid cell 
(ILCs) in WAT and developmental and functional characterization of 
murine ILCs in WAT. a, Gating strategy to identify human ILCs. Stromal 
vascular fraction (SVF) cells from human abdominal subcutaneous white 
adipose tissues (WAT) were isolated and subjected to flow cytometric analyses. 
First plot pre-gated on singlets. Lineage cocktail 1 (Lin1): CD3, CD5, TCRa. 
Lineage cocktail 2 (Lin2): CD19, CD56, CD11c, CD16. ILCs are identified 

as Lin-negative cells that are CD25* CD127*. Plots shown are from an obese 
donor. b-e, SVF cells from murine epididymal (E)-WAT were isolated and 
subjected to flow cytometric analyses. ILCs were defined as live CD45* Lin™ 
CD25* CD127* cells. The lineage (Lin) cocktail included CD3, CD5, CD19, 
NK1.1, CD1 1c, CD11b and FceRIa. Comparison of Lin” CD25* CD127* cells 
in E-WAT of b, Id2*’* versus Id2~’~ bone marrow chimaeras, c, Tef7’ * 


versus Tcf7’ mice and d, Rag2~’~ versus Rag2/~ y,’~ mice. n = 3-8 mice 


per group from 2 independent experiments. e, E-WAT SVE cells from C57BL/6 
mice were treated with PMA (100 ng ml’) and ionomycin (1 pg ml’) in 
the presence of Brefeldin A (10 pg ml ') for 4h and stained for ILCs. Live 
CD45* Lin™ CD25* CD127* cells were pre-gated, and IL-5 and IL-13 protein 
levels were assessed. Plot shown is representative of n = 12 mice from 3 
independent experiments. f, Human WAT ILC2 frequencies were compared in 
the 7 youngest donors (36.0 + 3.5 years old) versus the 7 oldest donors 

(55.9 + 1.9 years old). g, Human WAT ILC2 frequencies in female non-obese 
donors with body mass index (BMI) < 30.0 kgm’ ? versus female obese donors 
with BMI = 30.0 kg m ~. Student’s t-test. **P< 0.01, ***P < 0.001. Data 

are shown as mean + standard error and are representative of 2 independent 
experiments. Sample sizes are biological replicates. 
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Extended Data Figure 2 | IL-33-deficient mice exhibit dysregulated group 2 
innate lymphoid cells (ILC2s) in association with increased adipocyte size 
and impaired glucose homeostasis. 1133°/* (n = 6) or 1133-/" (n=5) 

mice were fed a low-fat diet (10% kcal fat) for 12 weeks starting at 7 weeks of 
age. a, Representative plots and frequencies of live CD45* Lin” CD25* 
IL-33R* ILC2s in epididymal (E)-WAT (data are from Fig. 2a) and iWAT. 
Plots pre-gated on CD45* Lin’ cells that lack CD3, CD5, CD19, NK1.1, 
CD11c, CD11b and FceRIu. b, Frequencies of IL-5* IL-13~ and IL-5* IL-13* 
ILC2s in E-WAT and iWAT of wild-type and IL-33-deficient mice. E- WAT 
stromal vascular fraction cells were treated with PMA (100 ng ml ') and 
ionomycin (1 pg ml ') in the presence of brefeldin A (10 1g ml’) for 4h 
before staining for ILC2s and intracellular cytokines. Pre-gated on CD45* Lin™ 
CD25* IL33R* ILC2s. ¢, Inguinal white adipose tissue (iWAT) sections were 
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haematoxylin and eosin stained and imaged at X40 magnification. Adipocyte 
area was calculated from 25-40 adipocytes total from 2-3 images per mouse. 
d, 16-h fasting blood glucose concentrations. e, 16-h fasting serum insulin 
concentrations. f, Homeostatic model assessment of insulin resistance 
(HOMA-IR) index values. g, Glucose tolerance test (GTT) with 2 g per kg body 
weight glucose following a 16-h fast. h, Insulin tolerance test (ITT) with 0.5 U 
per kg body weight insulin following a 5-h fast. For panels a—f, groups were 
compared using Student’s t-test, *P < 0.05, ***P < 0.001. For panels g—h, a 
two-way ANOVA with repeated measures was performed followed by Tukey 
post-hoc test. *P < 0.05, **P < 0.01, ***P < 0.001. Data shown are from a 
single cohort and are representative of 2 independent experiment. Sample sizes 
are biological replicates. 
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Extended Data Figure 3 | IL-33 increases E-WAT ILC2s and regulatory 

T cells (T,eg,) and abrogates the development of obesity and glucose 
intolerance in mice fed a high-fat diet (HFD). Male C57BL/6 mice were 
placed on a control diet (CD) or HFD (60% kcal fat) at age 8 weeks. On the first 
day of feeding, CD mice were treated with PBS and HFD mice were treated with 
PBS or recombinant murine (rm)IL-33 (12.5 pg per kg body weight) once 
every 4 days by intraperitoneal injection for 4 weeks. a, E- WAT ILC2 numbers 
per gram of adipose, b, body weight, ¢, relative E-WAT weight and d, relative 
iWAT weight at week 4. e, 16-h fasting blood glucose concentrations and 

f, glucose tolerance testing during week 3. g, Frequencies and representative 
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plots of E-WAT Tyegs defined as live CD45* CD3* CD4* Foxp3* cells. Plots 
are gated on live CD45* CD3* CD4* cells, and numbers are the percentage 


of CD4* T cells that are Foxp3* Tyegs: h, Numbers of Tyeg cells per gram of 
adipose. All panels include n = 10 mice per group from 2 independent cohorts, 
except panel A which includes n = 16 CD PBS and n = 18 HFD PBS from 4 
independent cohorts. a~e, One-way ANOVA with Tukey post-hoc test, 

*P < 0.05, **P < 0.01, ***P < 0.001. f, Two-way ANOVA with repeated 
measures, ***P < 0.001 comparing CD PBS versus HED PBS, 4AAP < 0.001 
comparing HFD PBS versus HFD IL-33. Data are shown as mean ~ standard 
error. Sample sizes are biological replicates. 
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Extended Data Figure 4 | Decreased ambulatory activity may limit 
hyperphagia following IL-33 treatment, but IL-33 does not appear to have 
direct suppressive effects on food intake or ambulatory activity. a, Male 
C57BL/6 mice were treated with PBS or recombinant murine (rm)IL-33 
(12.5 pg per kg body weight) daily for 7 days (PBS n = 10, rmIL-33, n = 12). 
Over a 24h period between days 6 and 7, food intake and ambulatory activity 
were measured over 15-min intervals. The average difference in food intake 
or ambulatory activity between PBS- and rmIL-33-treated mice was calculated 
for each 15-min interval, and the differences in food intake and ambulatory 
activity were related by linear regression. Solid line, best-fit line. Dashed curves, 
upper and lower 95% confidence intervals around the best-fit line. Data are 
shown as mean differences for each interval and are representative of 2 
independent experiments. b-d, Male C57BL/6 mice were treated with PBS or 
recombinant murine (rm)IL-33 (12.5 pg per kg body weight) once and 
monitored for the first 3h post-treatment using CLAMS cages (n = 4 per 
group). b, Energy expenditure, c, food intake and d, ambulatory activity (beam 
breaks) were measured over of the first 3h post-treatment. Student's t-test. 
***P < 0.001. Data are shown as mean + standard error and are representative 
of 1 independent experiment. Sample sizes are biological replicates. 
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Extended Data Figure 5 | Brown adipose tissue (BAT) contains Lin™ cells. b, Numbers of ILC2s per gram of BAT. c, Ucp1 expression in BAT by 
CD25* IL-33R* ILC2s that expand in response to IL-33 in association with _ real-time PCR. d, UCP1 immunohistochemistry of BAT at X10 magnification. 
decreased Ucp1 expression. C57BL/6 male mice (10 weeks old) were treated Scale bars, 100 |tm. e, X40 magnification of d. Scale bars, 100 um. Student’s 
with PBS (n = 8) or IL-33 (12.5 pg per kg body weight, n = 8) daily by t-test, *P < 0.05, ***P < 0.001. Data are shown as mean ~ standard error 
intraperitoneal injection for 7 days. a, Representative plots and frequencies of and are representative of 2 independent experiments. Sample sizes are 

Lin” CD25* IL-33R* ILC2s in interscapular BAT. Gated on live CD45* Lin™ _ biological replicates. 
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Extended Data Figure 6 | ILC2s from E-WAT accumulate in white adipose 
tissue of recipient mice and expand in response to IL-33 to promote beiging. 
a, Experimental design for panels a, b. Live CD45* Lin” CD25* IL-33R* 
ILC2s were sort-purified from E-WAT of CD45.1* mice treated with 12.5 1g 
per kg body weight recombinant murine (rm)IL-33 daily for 7 days by 
intraperitoneal injection. PBS (n = 8) or ILC2s (1 X 10° total, n = 8) were 
transferred to CD45.2* recipient mice daily for 4 days by subcutaneous 
injection near iWAT (5 X 10* ILC2s, split evenly bilaterally) and 
intraperitoneal injection (5 X 10* ILC2s). Tissues were harvested on day 5 for 
analyses. b, Donor and recipient ILC2s in E-WAT, brown adipose tissue (BAT), 
mesenteric lymph nodes (mLN) and lung. iWAT ILC2 plots from this 
experiment are shown in main Fig. 3g. Pre-gated on Live CD45" Lin” CD25* 
IL-33R* ILC2s. Donor ILC2s are defined as CD45.1* CD45.2, whereas 
recipient ILC2s are defined as CD45.1” CD45.2". Representative plots shown. 
Frequencies represent percent of ILC2s that are recipient or donor cells. 
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Student’s t-test, ***P < 0.001. c, Experimental design for panels c-e. 
Sort-purified CD45.1* ILC2s (10°) from E-WAT of IL-33-treated mice 

(as described above) were transferred into Rag2’~ yc ’~ recipients by a single 
intraperitoneal injection. ILC2-sufficient Rag2 ‘~ mice, ILC2-deficient 
Rag2 ‘~ ye’ mice and ILC2-reconstituted Rag2 ‘~ yc /~ mice were treated 
with PBS or rmIL-33 (12.5 yg per kg body weight) by intraperitoneal injection 
daily for 7 days. There were n = 4 mice per group. This experimental design 
corresponds to main Fig. 3l-m. d, Representative plots of live CD45.1* Lin™ 
CD25" IL33R* ILC2s in E-WAT. Blue, recipient cells. Red, donor cells. 
Lineage cocktail includes CD3, CD5, CD19, NK1.1, CD11c, CD11b and 
FceRIa. e, {WAT UCP1 IHC. Scale bars, 100 um. ANOVA with Tukey post-hoc 
test, ***P < 0.001. Data are shown as mean = standard error and are 
representative of 2 independent experiments. Sample sizes are biological 
replicates. 
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Extended Data Figure 7 | IL-33 treatment and ILC2 transfer can elicit 
beiging independently of eosinophils and IL-4Ra signalling. a—f, Wild-type 
(Balb/c), DblGata1 mice that lack eosinophils or Il4ra ‘~ mice that have 
dysregulated alternatively activated macrophages (AAMacs) (both mutant 
strains on a Balb/c background) were treated with PBS or recombinant murine 
(rm)IL-33 (12.5 jig per kg body weight) daily by intraperitoneal injection for 
7 days. a, iWAT ILC2 numbers per gram of adipose and b, iWAT UCP1 
immunohistochemistry (IHC) in Balb/c mice (PBS, n = 4; rmIL-33, n = 3). 

c, iWAT ILC2 numbers per gram of adipose and d, iWAT UCP1 IHC in 
DbIGata1 mice (PBS, n = 5; rmIL-33, n = 6). e, iWAT ILC2 numbers per gram 
of adipose and f, iWAT UCP1 IHC in Il4ra~/~ mice (PBS, n = 4; rmIL-33, 
n= 6). g, h, Live CD45* Lin” CD25* IL-33R* ILC2s were sort-purified 
from E-WAT of C57BL/6 mice treated with rmIL-33 (12.5 ig per kg body 
weight) daily for 5-7 days by intraperitoneal injection to Rag] ‘~ mice ona 
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C57BL/6 background. ILC2s (1 X 10° total) were transferred to recipient 
mice daily for 4 days by subcutaneous injection (PBS, n = 8; ILC2, n = 8). 

g, iWAT ILC2 numbers per gram of adipose and h, iWAT UCP1 IHC. i-l, Live 
CD45* Lin” CD25* IL-33R* ILC2s were sort-purified from E-WAT of Balb/c 
mice treated with rmIL-33 (12.5 ig per kg body weight) daily for 5-7 days 

by intraperitoneal injection. ILC2s (1 X 10° total) were transferred to recipient 
mice daily for 4 days by subcutaneous injection. i, iWAT ILC2 numbers per 
gram of adipose and j, iWAT UCP1 IHC in Db/Gata1 recipients (PBS, n = 6; 
ILC2, n = 6). k, iWAT ILC2 numbers per gram of adipose and 1, iWAT UCP1 
IHC in I4ra/— recipients (PBS, n = 3; ILC2, n = 4). Scale bars, 100 um. 
Student’s t-test, *P < 0.05. Data are shown as mean = standard error and are 
representative of 2 independent experiments. Sample sizes are biological 
replicates. 
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Extended Data Figure 8 | Summary model linking the IL-33/ILC2/MetEnk 
pathway to the regulation of beiging and obesity. Interleukin (IL)-33 acts 
on group 2 innate lymphoid cells (ILC2s) to upregulate production of the 
effector molecules IL-5, IL-13 and enkephalin peptides. ILC2-derived IL-5 
promotes eosinophil homeostasis in WAT, and eosinophils in turn produce 
IL-4 to sustain alternatively activated macrophages (AAMacs) in WAT. 
ILC2-derived IL-13 can also promote AAMac responses. In the setting of 
chronic exposure to a cold environment, eosinophil-derived IL-4 stimulates 
AAMacs to produce catecholamines such as noradrenaline, which acts directly 
on beige adipocytes to upregulate uncoupling protein 1 (UCP1) expression and 
promote mitochondrial biogenesis. Although it remains unknown whether 
ILC2-derived IL-5 and IL-13 contribute to cold-stress-induced beiging, 
ILC2-derived enkephalin peptides can act directly on beige adipocytes to 
upregulate UCP1 and promote beiging. This results in increased energy 
expenditure and decreased adiposity that may counteract weight gain. In the 
setting of obesity, IL-33 expression in WAT is increased; however, WAT ILC2s 
are paradoxically decreased in both mice and humans, suggesting that the 
IL-33/ILC2 axis is dysregulated in obesity. This may impede the ability of ILC2s 
to contribute to the function of beige fat, resulting in a vicious cycle that 
promotes weight gain. 
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Extended Data Table 1 | Characteristics of non-obese and obese human donors 


Characteristic ey ze yee P-value* 
poate One) 29%/71% 43%/57%  P=0.43 
Age 39.3 +/-5.2 52.6+/-2.7 P=0.042 
Sex, % female 43% 87% P=0.094 
BMI (kg/m’) 23.5 +/-1.4 42.6 +/-3.9 P=0.0006 
History of Type 2 diabetes 14% 43% P=0.24 
History of liver disease 0% 0% n/a 
History of cardiovascular disease 0% 13% P=0.30 


BMI, body mass index; CHTN, Cooperative Human Tissue Network; NYODP, New York Organ Donor Program. 
* Proportions were compared by va tests. Continuous variables were compared by Student's t-test. Exact P values are shown. 
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Extended Data Table 2 | List of genes with single nucleotide polymorphisms associated with human obesity 


Human gene Murine ortholog Inclusion in 
symbol Human gene name gene symbol Microarray 
A disintegrin-like and metallopeptidase (reprolysin type) with 
ADAMTS9 thrombospondin type 1 motif, 9 Adamts9 Yes 
BCDIN3D BCDIN3 domain containing Bedin3d Yes 
BDNF Brain-derived neurotrophic factor Bdnf Yes 
CADM2 Cell adhesion molecule 2 Cadm2 Yes 
CNR1 Cannabinoid type 1 receptor Cnr1 Yes 
CPEB4 Cytoplasmic polyadenylation element binding protein 4 Cpeb4 Yes 
CTNNBL1 Catenin, beta like 1 Ctnnbl1 Yes 
DLK1 Delta-like homologue 1 Dik1 Yes 
ENPP1 Ectonucleotide pyrophasphatase/phosphodiesterase 1 Enpp1 Yes 
ETV5 Ets variant 5 Etv5 Yes 
FAIM2 Fas apoptotic inhibitory molecule 2 Faim2 Yes 
FANCL Fanconi anemia, complementation group L Fancl Yes 
FTO fat mass and obesity associated Fto Yes 
GHSR Growth hormone receptor secretogogue receptor Ghsr Yes 
GIPR Gastric inhibitory polypeptide receptor Gipr Yes 
GNPDA2 Glucosamine-6-phosphate deaminase 2 Gnpda2 Yes 
GPRC5B G protein-coupled receptor, family C, group 5, member B Gprc5b Yes 
GRB14 Growth factor receptor-bound protein 14 Grb14 Yes 
HMGA1 High mobility group AT-hook 1 Hmga1 Yes 
HMGCR 3-hydroxy-3-methylglutaryl-CoA reductase Hmger Yes 
HOXC13 Homeobox C13 Hoxe13 Yes 
ITPR2 Inositol 1,4,5-trisphosphate receptor, type 2 Itpr2 Yes 
KCTD15 Potassium channel tetramerization domain containing 15 Ketd15 Yes 
KLF7 Kruppel-like factor 7 KIf7 Yes 
LEP Leptin Lep Yes 
LEPR Leptin receptor Lepr Yes 
LMNA Lamin A/C Lmna Yes 
LRP1B Low density lipoprotein receptor-related protein 1B Lrp1b Yes 
LINGO2 
(LRRN6C) Leucine rich repeat and lg domain containing 2 Lingo2 Yes 
LY86 Lymphocyte antigen 86 Ly86 Yes 
LYPLAL1 Lysophospholipase-like 1 Lyplal1 Yes 
MAF v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog Maf Yes 
MAP2K5 Mitogen-activated protein kinase kinase 5 Map2k5 Yes 
MC4R Melanocortin 4 receptor Mc4r Yes 
MSRA Methionine sulfoxide reductase A Msra Yes 
MTCH2 Mitochondrial carrier 2 Mtch2 Yes 
MTIF3 Mitochondrial translational initiation factor 3 Mtif3 Yes 
MTMR9 Myotubularin related protein 9 Mtmr9 Yes 
NAMPT Nicotinamide phosphoribosyltransferase Nampt Yes 
NCR3 Natural cytotoxicity triggering receptor 3 Ner3 No 
NEGR1 Neuronal growth regulator 1 Negr1 Yes 
NFE2L3 Nuclear factor, erythroid 2-like 3 Nfe2l3 Yes 
NPC1 Niemann-Pick disease, type C1 Npc1 Yes 
NPY2R Neuropeptide Y receptor Y2 Npy2r Yes 
NRXN3 Neurexin 3 Nrxn3 Yes 
PCSK1 Proprotein convertase subtilisin/kexin type 1 Pcsk1 Yes 
PIGC Phosphatidylinositol glycan anchor biosynthesis, class C Pigc Yes 
POMC Proopiomelanocortin Pomc Yes 
PRKD1 Protein kinase D1 Prkd1 Yes 
PRL Prolactin Pri Yes 
PTBP2 Polypyrimidine tract binding protein 2 Ptbp2 Yes 
PTER Phosphotriesterase related Pter Yes 
RSPO3 R-spondin 3 Rspo3 Yes 
SDCCAG8 Serologically defined colon cancer antigen 8 Sdccag8 Yes 
SEC16B SEC16 Homolog B Sec16b Yes 
SH2B1 SH2B adaptor protein 1 Sh2b1 Yes 
SLC39A8 Solute carrier family 39 (zinc transporter), member 8 Sic39a8 Yes 
SNRPN Small nuclear ribonucleoprotein polypeptide N Snrpn Yes 
SsOcs1 Suppressor of cytokine signaling 1 Socs1 Yes 
SOCS3 Suppressor of cytokine signaling 3 Socs3 Yes 
STAB1 Stabilin 1 Stab1 Yes 
TBC1D1 TBC1 (tre-2/USP6, BUB2, cdc16) domain family, member 1 Tbc1d1 Yes 
TBX15 T-box 15 Tbx15 Yes 
TFAP2B Transcription factor AP-2 beta (activating enhancer binding protein 2 beta) Tfap2b No 
TMEM160 Transmembrane protein 160 Tmem160 Yes 
TMEM18 Transmembrane protein 18 Tmem18 Yes 
TNNI3K TNNIS interacting kinase Tnni3k Yes 
TUB Tubby bipartite transcription factor Tub Yes 
VEGFA Vascular endothelial growth factor A Vegfa Yes 
ZNF608 Zinc finger protein 608 Zfp608 Yes 
ZNRF3 Zinc and ring finger 3 Znrf3 Yes 


Genes are derived from references 26 and 27. 
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Crystal structure of the human OX, orexin receptor 
bound to the insomnia drug suvorexant 


Jie Yin', Juan Carlos Mobarec’, Peter Kolb? & Daniel M. Rosenbaum! 


The orexin (also known as hypocretin) G protein-coupled receptors 
(GPCRs) respond to orexin neuropeptides in the central nervous sys- 
tem to regulate sleep and other behavioural functions in humans’. 
Defects in orexin signalling are responsible for the human diseases 
of narcolepsy and cataplexy; inhibition of orexin receptors is an effec- 
tive therapy for insomnia’. The human OX, receptor (OX)R) belongs 
to the B branch of the rhodopsin family of GPCRs’, and can bind to 
diverse compounds including the native agonist peptides orexin-A 
and orexin-B and the potent therapeutic inhibitor suvorexant*. Here, 
using lipid-mediated crystallization and protein engineering witha 
novel fusion chimaera, we solved the structure of the human OX,R 
bound to suvorexant at 2.5 A resolution. The structure reveals how 
suvorexant adopts a 1-stacked horseshoe-like conformation and binds 
to the receptor deep in the orthosteric pocket, stabilizing a network of 
extracellular salt bridges and blocking transmembrane helix motions 
necessary for activation. Computational docking suggests how other 
classes of synthetic antagonists may interact with the receptor at a 
similar position in an analogous 1-stacked fashion. Elucidation of 
the molecular architecture of the human OX,R expands our under- 
standing of peptidergic GPCR ligand recognition and will aid fur- 
ther efforts to modulate orexin signalling for therapeutic ends. 

The orexin system modulates diverse behaviours in mammals, includ- 
ing sleep, arousal and feeding’. Orexin neurons in the lateral hypothal- 
amus uniquely produce the 33-amino-acid orexin-A and 28-amino-acid 
orexin-B neuropeptides. Orexin receptors OX,R and OX3R, distribu- 
ted throughout the central nervous system, respond to these peptides 
to control neuronal activity. Signals generated from the hypothalamus, 
the limbic system and the periphery converge on the orexin neurons, 
which act as central integrators of environmental cues and extend pro- 
cesses to many different brain centres. The orexin receptors belong to 
the rhodopsin family of GPCRs and relay neuropeptide binding at syn- 
apses into intracellular activation of heterotrimeric Gg); and Gio (ref. 5). 
The importance of the orexin system in phasic control of sleep-wake 
cycles was highlighted by discoveries that disruption/deletion of orexin 
or OX,R causes narcolepsy in dogs®, mice’ and humans*. As a result, a 
number of potent dual orexin receptor antagonists (DORAs) have been 
developed and tested over the past decade’, culminating in US Food and 
Drug Administration (FDA) approval of the first-in-class drug suvor- 
exant (Belsomra) for insomnia. Suvorexant binds to human OX,R 
and OX,R (hOX,R and hOX,R) with sub-nanomolar affinity, potently 
inhibits orexin receptor signalling in cell-based assays, and promotes 
the transition to rapid eye movement (REM) and slow wave sleep in 
animals and humans**’. 

To understand better the molecular basis of orexin receptor ligand 
recognition and signalling, we sought to obtain a high-resolution X-ray 
crystal structure of hOX,R. Protein engineering (fusion proteins’® and 
thermostable mutants") and lipid-mediated crystallization methods” 
have recently enabled the determination of the structures of GPCRs 
for diverse ligands such as biogenic amines, nucleotides, peptide hor- 
mones and lipids. OX,R and OX,R belong to the 8 branch of the rho- 
dopsin family of GPCRs, which contains receptors for neuropeptides 


such as the tachykinins, oxytocin/vasopressin and neurotensin’. Crystal 
structures of thermostabilized mutants of the rat neurotensin receptor 
(NTSR1), in partially active’ and inactive’* conformations, constitute 
the only crystallographic data currently available for this physiologic- 
ally important group of GPCRs. Our attempts to express and crystallize 
ahOX,R-TAL fusion protein, an approach we originally developed for 
the f, adrenergic receptor (BAR)"°, were unsuccessful. 

We therefore explored the use of alternative fusion protein partners 
that would help hOX,R pack into a well-defined three-dimensional lat- 
tice. For candidate fusion partners, we searched for domains of fewer 
than 200 amino acids from extreme thermophiles, which had been pre- 
viously crystallized and characterized by X-ray diffraction at high reso- 
lution and have amino and carboxy termini that are close (within 10 A) 
in three-dimensional space. Using a construct in which the 196-amino- 
acid catalytic domain of Pyrococcus abysii glycogen synthase (PGS)** 
replaced 39 residues of the third intracellular loop (ICL3), we were able 
to grow microcyrstals of hOX)R ina cholesterol-doped monoolein cubic 
phase (Extended Data Fig. 2) and solve the suvorexant-bound struc- 
ture at 2.5 A resolution (Extended Data Table 1). As expected, the PGS 
domain promotes tight packing of hOX)R into a crystal lattice in which 
membrane layers containing the embedded GPCR alternate with aque- 
ous layers containing the fusion partner (Fig. 1a). 

The overall seven-transmembrane (TM) fold of hOX2R resembles 
other GPCR structures (Fig. 1a, b). Despite a low sequence similarity 
(23% identity), the backbone root mean square deviation (r.m.s.d.) rela- 
tive to the inactive-state B,AR’° is only 2.2 A. The backbone r.m.s.d. 
compared with NTSR1 (22% identity) is 1.3 A for the inactive-state’ 
and 2.3 A for the partial active-state conformation". At the extracel- 
lular surface, residues 190-212 in the second extracellular loop (ECL2) 
form a B-hairpin (Fig. 1c, d) analogous to that seen in other peptide- 
binding GPCRs such as NTSRI (ref. 13), the ,1-opioid receptor’® and 
CXCR4 (ref. 17)—this B-hairpin structure contains amino acids impor- 
tant for orexin binding and activation’®. 

Superposition of suvorexant-bound hOX,R with the antagonist-bound 
M3 muscarinic acetylcholine receptor’’, another G,-coupled GPCR, 
shows a high degree of overlap between TM backbones at the intracel- 
lular surface (Fig. 1b). One difference is that the conserved ‘DRY motif’ 
on TM3, part of an inhibitory interaction network in the rhodopsin 
family of GPCRs”, is DRWY in hOX)R. Residues D151 3.49 and R152°° 
(superscripts are Ballesteros- Weinstein numbering throughout) make 
an intra-motif salt bridge, while R152°°° and W153°*' contact the cyto- 
plasmic ends of TM5 and TM6 (Q245°%, 1246°*! and L306°°”). Over- 
all, numerous hydrophobic and polar contacts bind TM5 and TM6 
to the other TM domains, restricting the outward movement of these 
a-helices necessary for GPCR activation. The suvorexant-bound hOX,R 
structure thus represents an inactive-state conformation, consistent 
with the efficacy profile of suvorexant as a DORA ligand. 

The suvorexant-binding pocket is open to the extracellular space 
through a constricted solvent-accessible channel (Fig. 2a) rimmed by 
amino acids from the extracellular ends of TM2, TM5-7 and the ECL2 
B-hairpin. A complex network of electrostatic interactions covers the 
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extracellular surface of the receptor, including salt bridges on both 
sides of the ligand entry channel (D115”°°-H350’*, E1187°°-R339778, 
D211*°*'-R328°”, E212*°*?-H224°*?) that stabilize the extracellular 
TM conformation (Figs 1c and 2a). A similar extracellular salt bridge in 
B2AR (ECL2 to TM3) was previously shown to be a ligand-dependent 
switch by NMR spectroscopy". Mutation of residue D211*** to Ala has 
one of the greatest characterized deleterious effects on orexin-A potency, 
but has little impact on binding of some DORAs such as almorexant'*$— 
this amino acid is over 6 A more extracellular than the closest suvor- 
exant atom in the crystal structure. The difference between orexin and 
DORA sensitivity to D211A*°*' suggests that modulation or competi- 
tion of the extracellular salt bridges may be involved in orexin binding 
and activation of the receptor. In further support of this hypothesis, 
the neurotensin agonist peptide NTSg_13 present in the partially active 
NTSRI structure’’ occupies a more extracellular position than suvor- 
exant, adjacent to the B-hairpin, stabilizing a slight inward movement 
of TM5 and TM6 (Fig. 1d). Such inward movements of TM5 and TM6 
relative to the rest of the TM bundle at the orthosteric binding pocket 
may bea general trigger for agonist-mediated GPCR activation, as they 
have also been observed for the B,AR”* and the M2 muscarinic ace- 
tylcholine receptor”’. 

Suvorexant sterically inhibits inward motions of TM domains by 
lodging deep in the orthosteric site and contacting all TM o-helices 
except TM1 (Fig. 2b, c). The shape of suvorexant in the ligand-binding 
pocket resembles a horseshoe, due to a boat conformation of the dia- 
zepane ring and intramolecular n-stacking between the aromatic benz- 
oxazole and p-toluamide groups (Fig. 2b, c and Extended Data Fig. 3). 
A similar conformation ofa suvorexant analogue was previously found 
in small-molecule crystals and by NMR experiments in solution, indi- 
cating that the horseshoe probably represents a low-free-energy state of 
the isolated ligand. Most of the ligand contacts involve van der Waals 
interactions or aromatic packing, with few direct polar interactions aside 
from a notable hydrogen bond from N324°*° to suvorexant’s tertiary 
amide carbonyl. Several water-mediated hydrogen bonds form bridges 
between suvorexant and polar amino acids such as N324°°° and H350”*? 
(Fig. 2b, c). Although the effects of mutagenesis on suvorexant affinity 
to hOX,R have not been reported, certain Ala mutants appear to have 
a broad deleterious effect on DORA binding"*: W214°”’ and Y223°°* 
do not directly participate in suvorexant binding, but are critical to the 
structural integrity of the ECL2 B-hairpin; F227°*? at the base of the 
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Figure 1 | Fusion protein engineering and 
structural features of hOX2R. a, Left, global 
structure of the hOX,R-PGS fusion protein. 
hOX,R is represented as an orange cartoon, with 
the PGS domain (grey cartoon) fused at ICL3. 
Suvorexant is shown as spheres with yellow 
carbons. Dotted line represents the five amino acids 
that could not be modelled at the tip of the 
B-hairpin in ECL2. Right, packing of the hOX,R- 
PGS fusion protein in the lipidic-cubic-phase- 
derived crystal lattice. b, Overlap between 
suvorexant-bound hOX,R (orange cartoon) and 
antagonist-bound M3R” (green cartoon; Protein 
Data Bank (PDB) accession 4DAJ) at the 
intracellular surface. The DRWY sequence on 
TM3 and interacting residues on TM5 and 6 are 
shown as blue sticks. c, Salt-bridge network at 
the extracellular surface of hOX>R. Residues 
participating in electrostatic interactions are shown 
as magenta sticks, with suvorexant represented as 
spheres with yellow carbons. ECL3 is removed 
for clarity. d, Superposition of hOX,R (orange 
cartoon) and NTSR1 (blue cartoon) in a partial 
active-state conformation (PDB accession 
4GRV)"*. Suvorexant (yellow carbons) and the 
NTSs_13 agonist (teal carbons) are shown as 
transparent sticks. ECL3 is removed for clarity. 
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Figure 2 | Suvorexant interaction with hOX)R. a, Solvent-accessible channel 
to the ligand-binding site. The solvent-accessible surface of the receptor is 
coloured according to electrostatic potential. Suvorexant is shown as spheres 
with yellow carbons. b, Two-dimensional schematic of contacts between 
suvorexant and the receptor. c, Three-dimensional interaction between 
suvorexant and hOX,R, showing all residues within 4 A of the ligand as sticks 
with orange carbons. Hydrogen bonds are shown as black dashes. H,O 4025 is 
shown as a red sphere. 
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Figure 3 | Docked poses for synthetic orexin receptor antagonists. a, Left, 
chemical structure of suvorexant. Right, recapitulated binding mode of 
suvorexant (purple carbons) superimposed with the observed pose in the 


binding pocket packs against the methyl-diazepane ring; Y317°** con- 
tacts the 1,2,3-triazole; and H350’*° 1-stacks with the p-toluamide 
group. The residues surrounding suvorexant and the ligand entry chan- 
nel are almost identical between hOX,R and hOX,R (Extended Data 
Figs 4 and 5), explaining suvorexant’s ability to bind tightly and inhibit 
both receptors’. Out of 30 residues that are within 6 A distance of suvor- 
exant in the hOX)R structure, only two amino acids are different com- 
pared with hOX,R: T111**" is changed to Ser and T135°*? is changed 
to Ala (overall sequence identity, 67%). This sequence conservation 
also implies that the 12-fold higher orexin-B affinity (and 40-fold higher 
potency) for hOX,R over hOX,R” is probably due to differences in 
interactions that are remote from the deeply membrane-embedded 
orthosteric binding pocket. 

We have previously used computational docking methods to effec- 
tively predict interactions between a GPCR of known structure and 
small-molecule ligands*®. With the newly available hOX,R structure, we 
carried out molecular docking calculations (see Methods) to generate 
possible binding modes for three additional high-affinity orexin recep- 
tor antagonists that have chemical scaffolds distinct from suvorexant: 
almorexant”’, EMPA’*, and SB-674042 (ref. 29). As a control, we showed 
that our docking protocols were capable of accurately reproducing the 
interaction between suvorexant and hOX,R in the crystal structure 
(Fig. 3a). Predicted poses for each of the other docked ligands establish 
a hydrogen bond with N324°°° (Fig. 3b-d), and two of the three adopt 
a m-stacked horseshoe-like conformation that mimics the binding of 
suvorexant (Fig. 3b, d). The amide functionality of almorexant forms 
a bidentate hydrogen bond with N324°°° and H350”*? (Fig. 3b), and 
mutation of the latter residue to Ala was shown experimentally to 
reduce binding affinity for hOX,R™. In the predicted pose for EMPA, 
hydrogen bonds are established between the methoxy substituent on 
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crystal structure (yellow carbons). Hydrogen bonds are shown as black dashes. 
b-d, Chemical structure and predicted binding mode of almorexant (green 
carbons) (b), EMPA (cyan carbons) (c) and SB-674042 (tan carbons) (d). 


the 2-methoxypyridine and T111*°' and Y354”** on the receptor 
(Fig. 3c), both of which are implicated in EMPA’s interaction by muta- 
tional data'**°. In contrast to the other two molecules, no EMPA pose 
featured intramolecular n-stacking similar to suvorexant. For almor- 
exant and EMPA, docking also yielded favourably scored second bind- 
ing modes consistent with mutational data (Extended Data Fig. 6a, b). 
Finally, the predicted pose for SB-674042 closely resembles the bind- 
ing mode of suvorexant, with its phenyl-oxadiazole overlapping almost 
perfectly with suvorexant’s benzoxazole and its 2-methyl-thiazole over- 
lapping with suvorexant’s triazole (Fig. 3d). Overall, the prediction of 
intramolecular m-stacked conformations for multiple docked orexin 
receptor antagonists suggests that this property may bea general favour- 
able design feature for synthetic molecules targeting the orthosteric site 
of hOX,R. 

We solved a high-resolution crystal structure of hOX,R bound to 
the therapeutic compound suvorexant, providing a molecular frame- 
work for understanding DORA binding and stabilization of the inac- 
tive state by a salt-bridge network at the extracellular surface. Docking 
calculations predict putative stable binding modes for other orexin recep- 
tor antagonists, which are consistent with known mutational data. This 
knowledge will serve as a powerful tool in the design ofimproved agents 
that can activate or inactivate orexin signalling. 


Online Content Methods, along with any additional Extended Data display items 


and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Cloning, expression and purification. A DNA fragment corresponding to resi- 
dues 1-386 of hOX,R was cloned into a modified pFastBac (Invitrogen) baculo- 
virus expression vector with the haemagglutinin (HA) signal sequence followed by 
the Flag tag at the N terminus”. The 58 C-terminal (intracellular) amino acids of 
hOX,R were omitted owing to the prediction that they are unstructured and do not 
comprise part of the 7TM bundle. The hOX,R-PGS fusion protein construct was 
generated by substituting a synthetic DNA fragment containing the 196-amino-acid 
coding sequence of P. abysii glycogen synthase (PDB accession 2BFW)° for residues 
255-293 in the hOX,R ICL3 using an adapted Multi-Site Quickchange protocol 
(Stratagene). For purification, a deca-histidine tag was added at the C terminus. 
The resulting construct was transfected into Sf9 cells to produce a recombinant 
baculovirus with the Bac-to-Bac system (Invitrogen). Sf9 cultures were infected 
with recombinant baculovirus at a cell density of 3 X 10° per ml and 1 1M suvor- 
exant was added to the media. Infected cells were grown for 48 h at 27 °C, and cells 
were harvested and stored at —80 °C for future use. 

Sf9 cell membranes were lysed in a hypotonic buffer containing 10 mM Tris pH 
7.5, 1mM EDTA, 160 1g ml’ benzamidine, 100 1g ml’ leupeptin, 2 mg ml’ 
iodoacetamide and 1 1M suvorexant (Selleck Chemicals). Lysed membranes were 
re-suspended and homogenized by dounce in a buffer containing 50 mM Tris pH 7.5, 
500 mM NaCl, 1% (w/v) n-dodecyl-B-b-maltopyranoside (DDM; Anatrace), 0.2% 
sodium cholate, 0.2% cholesteryl hemi-succinate (CHS), 10% glycerol, 2 mg ml : 
iodoacetamide and 5 [tM suvorexant. Solubilization proceeded for 1h at 4 °C, fol- 
lowed by ultracentrifugation for 30min at 100,000g. After centrifugation, the 
solubilized supernatant supplemented with 20 mM imidazole was incubated with 
Ni-NTA agarose beads (GE Healthcare) in batch-binding mode for 3h at 4°C. 
After binding, beads were washed with 15 column volumes of Ni-NTA buffer: 50 mM 
Tris pH 7.5, 500 mM NaCl, 0.1% DDM, 0.02% sodium cholate, 0.02% CHS, 5% 
glycerol, 50 mM imidazole and 5 1M suvorexant. Protein was eluted with 5 column 
volumes of Ni-NTA wash buffer with 200 mM imidazole. The eluate from nickel- 
affinity chromatography was supplemented with 2 mM calcium and loaded onto 
M1 anti-Flag affinity beads (Sigma). Detergent was exchanged on the M1 resin 
from DDM to 0.05% lauryl maltose neopentyl glycol (LMNG; Anatrace). Receptor 
was eluted from the M1 beads with 200 jig ml Flag peptide plus 5 mM EDTA. To 
remove N-linked glycan from the receptor, PNGaseF (NEB) was added and the 
reaction was incubated at 4 °C overnight. Finally, protein was concentrated in a 
100 kDa cut-off Vivaspin column (Sartorius) and run ona Superdex 200 size exclu- 
sion column (GE Healthcare). The purified protein displayed a single monodisperse 
peak in the size exclusion profile (Extended Data Fig. 1a), and was >95% pure as 
judged by SDS-PAGE gel electrophoresis (Extended Data Fig. 1b). 
Crystallization. Purified receptor was concentrated to >30 mg ml‘ using a Viva- 
spin concentrator with a 100 kDa molecular weight cut-off (Sartorius) and subjected 
to crystallization by the in meso method”. The concentrated protein was recon- 
stituted into a lipid mixture containing monoolein plus 10% (w/w) cholesterol 
(Sigma), where the protein solution:lipid mass ratio was 2:3. Receptor and lipid 
components were mixed at room temperature using a syringe mixing apparatus. 
Crystallization experiments were carried out in 96-well glass sandwich plates (Molec- 
ular Dimensions) by a Gryphon LCP crystallization robot (Art Robbins Instru- 
ments) using a 40 nl protein cubic phase overlaid with 800 nl precipitant solution. 
Crystallization plates were incubated at 20°C and initial crystals appeared after 
24h in a precipitant condition consisting of 100 mM MES pH 6.0, 30% PEG 400, 
200 mM sodium formate. Crystals matured to full size in 3 days. Improved crystals 
were obtained in a condition consisting of 100 mM sodium citrate pH 5.9, 31% 
PEG 400, 200 mM sodium formate, 3% 2,5-hexanediol (Extended Data Fig. 2). 
Crystals were cryo-protected by harvesting directly from the LCP/precipitant set- 
ups with 100 um MiTeGen loops and flash freezing in liquid nitrogen. 

Data collection and processing. All diffraction data were collected at the 23ID-D 
beamline (GM/CA-CAT) at the Advanced Photon Source, Argonne National Labo- 
ratory, which is equipped with a Pilatus3 6M detector. Data sets were acquired using 
a 20 um collimated minibeam with 1.033 A wavelength X-rays. For a typical crystal, 
twenty-five 0.4° oscillation images were collected, with 1 s exposure and without 
attenuation of the beam, before radiation damage became excessive. Diffraction data 
from 52 crystals were merged into one complete data set. The resolution limit was set 
at 2.5 A after anisotropy analysis with HKL3000 (ref. 33) (Extended Data Table 1). 
Structure determination and refinement. The structure of hOX,R-PGS was solved 
by molecular replacement with Phaser** in Phenix”. The PGS domain (PDB acces- 
sion 2BFW)"* and ,t-OR (PDB accession 4DKL)’"* were used as independent search 
models after analysis with Sculptor in Phenix”. The resulting solution was improved 
by auto-building in Buccaneer** and by manual iterative building in Coot*’ fol- 
lowed by refinement with Phenix. Translation-libration-screw (TLS) refinement 
was employed to model atomic displacement factors, with TLS groups generated 
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by the TLSMD web server”. Initial coordinates and refinement parameters for the 
suvorexant ligand were prepared with the PRODRG” web server. An elongated 
feature in the electron density map, which was observed within the bilayer region, 
was modelled as oleic acid. MolProbity*° was used to evaluate the final structure. 
In the Ramachandran plot, 98.1% of residues were in favoured regions and 1.9% of 
residues were in allowed regions. The statistics for data collection and refinement are 
included in Extended Data Table 1. Figures were prepared using PyMol (Schrodinger 
LLC). The electrostatic potential surface shown in Fig. 2a was calculated using 
DelPhi*’, and the ligand contact map shown in Fig. 2b was made using LIGPLOT”. 
Small-molecule docking. Docking calculations were done with DOCK 3.6 (refs 
43, 44) and AutoDock” in order to obtain more diverse solutions. Dockings of the 
three orexin receptor antagonists to hOX2R with AutoDock v.4.2 (ref. 45) used a 
static receptor anda flexible ligand. Receptor and ligand preparation was performed 
with Autodock Tools (ADT). The reference grid box (60 X 60 X 60 points and 
0.375 A of grid spacing) surrounded the suvorexant pose in the hOX,R structure, 
allowing free ligand rotation and displacement. A genetic algorithm was used for 
exhaustive conformational sampling, and run 100 times with different random seeds. 

Docking of all compounds was also performed with DOCK 3.6 (refs 43, 44). 
Anchor spheres to guide the placement of the molecules were distributed based on 
the molecular surface of the receptor and the pose of suvorexant in the hOX,R struc- 
ture. The receptor was fixed during calculations and prepared for docking such 
that ionizable side chains were charged, except for histidines, for which protona- 
tion was modelled based on protein environment. 

To further enrich conformational space, small-molecule conformations were gen- 
erated with OMEGA“ (OpenEye), using default settings except for the forcefield 
(mmff94s); an increased maximum number of conformations (300); an enlarged 
energy window (20); and a decreased r.m.s.d. cut-off (0.3). Representative confor- 
mations were then manually positioned in the binding pocket and minimized 
using the CHARMm22 forcefield (Accelrys), not constraining N324°°° and H350’°° 
to allow for side-chain flips. Expert criteria, namely satisfaction of hydrogen bonds, 
matching of polar and apolar groups, and consistency with mutational data, were 
used in the inspection and final selection of the poses. Finally, all poses, including 
the DOCK- and AutoDock-derived ones were evaluated with the DSX scoring 
function*’. The poses shown were among the ones with the most favourable inter- 
action scores. Two-dimensional chemical structures were drawn with Marvin 6.2.0 
(Chemaxon). 
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Extended Data Figure 1 | Purification of crystallization-grade hOX,R- immunoaffinity chromatography. b, Coommassie-stained polyacrylamide gel 


PGS. a, Superdex 200 gel filtration profile of hOX,R-PGS purified by nickel _ electrophoresis (PAGE) of the isolated peak fraction from gel filtration. 
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Extended Data Figure 2 | Lipidic cubic phase crystallization setup for hOX,R-PGS. The image shows representative microcrystals of the hOX,R-PGS protein 
that were harvested to produce high-resolution diffraction. 
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Extended Data Figure 3 | Electron density map for suvorexant and surrounding residues. The 2F, — F, electron density map is contoured at 1.20. 
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Extended Data Figure 4 | Sequence alignment between hOX,R and hOX,R. Positions that are identical between the two receptors are highlighted with a red 


background. 
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LETTER 


LETTER 


Extended Data Figure 5 | Conservation of the orthosteric binding pocket _residues that are different coloured grey. T1117°" (to Ser) and T135°*? (to Ala) 
between hOX,R and hOX,R. Structure of the extracellular region of hOX,R, are the only residues within 6 A of suvorexant that are different between the two 
with residues that are identical between hOX,R and hOX,R coloured red, and GPCRs. ECL3 is removed for clarity. 
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Almorexant 


b 


Extended Data Figure 6 | Alternative docked poses for almorexant and 
EMPA. a, Left, chemical structure of almorexant. Right, second docked pose of 
almorexant (green carbons) that was favourably scored and in agreement with 
mutational data. b, Left, chemical structure of EMPA. Right, second docked 
pose of EMPA (cyan carbons) that was favourably scored and in agreement 
with mutational data. 
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LETTER 


Extended Data Table 1 


* Diffraction data from 52 crystals were merged into a complete data set. 


Data collection and refinement statistics 


hOX,R-PGS* 
Data collection 
Space group C2 
Cell dimensions 
a, b, c (A) 94.36, 75.82, 96.30 
a, By (°) 90.00, 111.71, 90.00 
Resolution (A) 50.00(2.50) | 
Reym OF Rmerge® 0.21(N/A) 
Tok 10.90/(0.86) 
a*, (0.26) 
b*, (2.00) 
c*, (3.80) 
Completeness (%) 99.90(99.00) 
Redundancy 14.30(5.9) 
Refinement 
Resolution (A) 43.70-2.50 (2.6-2.50) 
No. reflections 18,772 
Ryvork/ Ritee 0.19/0.24 (0.26/0.31) 
No. atoms 
Protein 3,810 
Ligand/ion 32 
Water 36 
B-factors 
Receptor 42.40 
Fusion protein 48.90 
Ligand/ion 26.90 
Other (Lipid and water) 39.35 
R.m.s deviations 
Bond lengths (A) 0.004 
Bond angles (°) 0.77 


+ Highest-resolution shell is shown in parenthesis. 
IRmerge higher than 1 is statistically meaningless, therefore Scalepack (HKL3000, ref. 33) does not report it. 


§ Crystals diffracted anisotropically. The correction for anisotropy was applied during scaling with Scalepack (HKL3000). //a/ values (a*, b* and c*) for the highest-resolution shell (2.62-2.5 A) were calculated by 
dividing mean intensity values in each direction with average error values. 
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CAREERS 


WORKFORCE TRACKING Software tool 
automates data collection p.253 


SATISFACTION Bad jobs do not get better, but 
good jobs continue to improve p.253 


FUNDING How to smooth out spending 
bumps and busts p.253 


Young researchers may be especially vulnerable at field sites. 


SOCIAL BEHAVIOUR 


Indecent advances 


Surveys of sexual harassment and assault during field 
research and on campus reveal a hitherto secret problem. 


BY VIRGINIA GEWIN 


rchaeologist Maureen Meyers never 
At up about the sexual harassment 
she endured from male colleagues and 
superiors at field sites and elsewhere during her 
20-year career. She rebuffed a male colleague's 
propositions, leading to his retaliatory dismissal 
of her diabetes-related diet and medication 
requirements on a later field excursion. A male 
superior once forced her to walk ahead of him 
at a field site “to find the electric fences first” and 
made her listen to his lurid stories. 
Meyers — now at the University of Missis- 
sippi in Oxford — considered abandoning her 


career several times. To help herself deal with 
what had happened, she recorded all her expe- 
riences in her diary. Last autumn, in response 
to the SAFE study documenting sexual harass- 
ment and assault in the field (K. B. H. Clancy 
et al. PLoS ONE 9, e102172; 2014), Meyers 
organized a survey of archaeologists in the 
southeastern United States and learned the 
extent and severity of similar behaviour today. 
The responses to both surveys confirmed that 
she had been far from alone. And she counts 
herself lucky. “I was never physically assaulted,” 
she says. 

Many women who work in scientific dis- 
ciplines involving remote fieldwork have 
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experienced similar ordeals. But accounts of 
predatory behaviour have largely remained 
shrouded in secrecy, conveyed mostly as 
whispered warnings. Early-career researchers 
— mainly women, although men note harass- 
ment as well — are most vulnerable, yet are 
loath to speak up about sexual harassment, and 
even assault, lest their reputations be tainted 
and their careers damaged as a result of peer 
scepticism or retaliation by the offender. 
According to the United Nations, harass- 
ment is defined as unwelcome sexual advances, 
requests for sexual favours and other verbal or 
physical conduct ofan intimate nature. The defi- 
nition comprises any such behaviour that creates 
a hostile or offensive work environment and can 
include hanging around the victim, making 
unwanted leading remarks or touching them, as 
well as attempted or actual sexual assault. 
Studies also show that sexual harassment is 
usually more about power than about sex, mak- 
ing harassment by senior scientists of their sub- 
ordinates the most difficult to deal with. Those 
who could be vulnerable to sexual harassment 
need to look to their personal safety — yet also 
have the problem of protecting their career. 
Although still dismaying, the outlook may 
slowly be improving. Early-career researchers, 
both men and women, and academic organiza- 
tions are beginning to develop individual and 
collective ways to protect potential and actual 
victims and to raise people’s awareness that 
harassment and assault continues and how best 
to handle it, whether as victim or colleague. 


SHINING A LIGHT 

The incidence of sexual harassment and assault 
at scientific field sites was quantified last July 
when the SAFE study was published. This 
online survey of field scientists uncovered 
a range of negative experiences; nearly two- 
thirds of the 666 respondents, who were mostly 
women, reported being sexually harassed at a 
field site, and one-fifth said that they had been 
sexually assaulted. The findings stunned the 
scientific community and prompted dozens of 
news articles and thousands of social-media 
postings. 

Those findings have sparked more surveys 
that will become the basis for clear guide- 
lines on acceptable behaviour at field sites 
and reporting procedures. In Meyers’ sur- 
vey, for example, conducted at the behest of 
the Southeastern Archaeological Conference 
(SEAC), more than two-thirds of the almost 
600 respondents said that they had experi- 
enced sexual harassment at a field site. > 
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RICHARD NOWITZ/GETTY 


> Some 13% said that the harassment directly 
affected their careers, forcing them to change 
field sites, jobs or research interests, or to leave 
the discipline altogether. 

And more than one-quarter said that the 
harassment had stymied their careers in other 
ways, such as causing them to question their 
abilities and their future in the discipline, fear- 
ing for their safety at field sites and being reluc- 
tant to conduct field research. 

Pat Knezek, now a science administra- 
tor, worked as an astronomer for more than 
20 years. When she was a junior researcher, 
magazine centrefolds were blatantly displayed 
at some US observatories, she says. That is 
not acceptable now, but more subtle preda- 
tory behaviour, such as invitations to junior 
researchers to discuss career prospects one- 
on-one after hours, continues and is harder to 
fight because it is less overt, she says. 

Harassment at field sites is not the only 
problem. A slew of high-profile cases at US 


BE PREPARED 
What to do 


Prevention tips 

@ Find out if there are rumours of sexual 
harassers in your field 

@ Familiarize yourself with the 
university’s sexual-harassment policies 
and reporting protocols 

@ Discuss living arrangements and job 
expectations with your supervisor before 
going into the field 

@ Know whom to report sexual 
harassment concerns to while in the field 
@ Speak up if you see others in an 
uncomfortable, unsafe situation 


How to respond to harrassment 

@ Save every correspondence (text, 
e-mail, voice mail, tweet) from the 
harasser 

@ Have witnesses to the harassment 
document what they saw 

@ Confide in a trusted colleague or friend 
and discuss the pros and cons of filing 
a report 

@ Contact your university’s 
ombudsperson, Title IX representative, 
Human Resources Office, or Equal 
Employment Opportunity Office (any 
of these could trigger an investigation, 
however) 

@ Ask about university resources, 
including confidential counselling, 
no-contact orders issued by the 
university, workplace accommodations 
(schedule changes, office location 
changes, leave of absence), and referrals 
to advocates for legal, medical or 
housing assistance. V.G. 


universities in the past few years has prompted 
federal directives that instruct universities to 
better respond to — and prevent — sexual 
assault on campus. Asa result, there has been 
more attention to Title IX, the US federal law 
that prohibits sex discrimination (including 
sexual harassment or assault) on campus. More 
universities are forming offices that address the 
response to and prevention of sexual harass- 
ment and violence, says Joan Slavin, director 
of Northwestern University’s Office of Sexual 
Harassment Prevention in Evanston, Illinois. 
And some professional scientific societies 
are creating guidelines and policies to deter 
predatory behaviour and to provide resources 
for female researchers who have been 
harassed or assaulted. 

Most other countries have yet to catch up 
with the United States. Nicole Westmarland, 
co-director of the Centre for Research into 
Violence and Abuse at Durham University, 
UK, says that British efforts to stop harass- 
ment of women in academia are not at US 
levels. A letter that she co-authored in Janu- 
ary in The Telegraph newspaper called for 
more clear-cut university policies on how to 
respond to sexual-assault complaints, and in 
an article in the newspaper a few days later 
she described UK universities’ sexual-assault 
policies as “archaic”. She says that university 
responses to sexual assault are most com- 
monly described as inaction, either because 
sexual assault is a police matter beyond their 
remit or because they do not take disciplinary 
actions against the aggressor. Some Nordic 
universities are training employees to deal 
with sexual-harassment concerns, but many 
think that the issue is also under-studied there. 


FIGHTING HARASSMENT 

Some young researchers are getting creative. 
Upset by accounts of harassment at poster ses- 
sions or of fear of walking back to a hotel or 
campus after a conference party, two female 
postdocs have created a ‘buddy systeny called 
Astronomy Allies that they unveiled in January 
at an American Astronomical Society (AAS) 
meeting in Seattle, Washington. Participants 
volunteer to form a ‘safe zone — as a buffer, 
bystander or advocate — for AAS members who 
feel threatened or unsafe. In that case they can 
text or call an ‘ally’ for an escort. 

Astronomy Allies has the support of the 
AAS Committee on the Status of Women in 
Astronomy. “We feel like we are breaking new 
ground by trying to make the community 
look at this issue — and find ways to protect 
the victims without putting ourselves in a 
position where we could get sued,” says Joan 
Schmelz, committee chair and an astronomer 
at the University of Memphis in Tennessee, 
who herself was sexually harassed early in her 
career. For legal reasons, Schmelz can disclose 
no details, but she wrote in a 2011 blogpost that 
it involved both a sexual component and — as 
typifies such behaviour — an abuse of power. 
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Evidence of sexual harassment can help prevent 
the abuse, says anthropologist Kate Clancy. 


“At the time, I was a young astronomer ina 
vulnerable position and the harasser was my 
supervisor,’ she wrote. She recalled that he told 
her that he wanted to put her in his pocket and 
take her out when it was convenient. 

After that blogpost she became a go-to 
confidante for women grappling with similar 
experiences. Having heard many stories, she 
finds it difficult to offer general advice. “Rarely 
do I recommend filing an official report as a 
first action because it can affect your standing 
in your department and community — espe- 
cially if you don't have a smoking-gun piece 
of evidence,” she says. And publicly naming 
the harasser carries a risk of getting sued for 
defamation of character (see ‘Anatomy of a sex- 
ual-harassment report’). Instead, she advises 
women to write down everything — the time, 
location, nature and details of an incident — 
and to save all evidence, including e-mails, 
texts and voice-mail messages. Then, she 
says, the victim should talk to someone they 
trust about the pros and cons of filing a report 
against the harasser (see ‘What to do). 

But she and others agree that predatory 
behaviour will stop only when the community 
decides that harassment and assault will not be 
tolerated and creates mechanisms that address 
them and make perpetrators accountable. 

The issue has garnered less attention in sci- 
entific fields with greater gender parity, such as 
ecology. But Jacquelyn Gill, a palaeoecologist at 
the University of Maine in Orono, and Joshua 
Drew, a conservation ecologist at Columbia 
University in New York, will tackle the issue 
with a panel discussion at the August meeting 
of the Ecological Society of America in Balti- 
more, Maryland. “We want to start important 
conversations — for example, sharing univer- 
sity reporting procedures with students in their 
own labs, departments and institutions,” says 
Gill. As a new principal investigator, she feels 


L. BRIAN STAUFFER 


CASE STUDY 


Anatomy of a sexual-harassment report 


Sally Smith (not her real name) was a 
PhD student working at a remote marine 
field station in North America when a 
field-research supervisor propositioned 
her. When she turned down his advances, 
he threatened to bar her access to the 
gear and equipment that she needed to 
complete her fellowship research. Then 
came the domineering body language and 
verbal abuse. 

She told the field-station manager, 
but he did nothing. Well-meaning senior 
women colleagues advised her not to 
draw attention to herself. Confused and 
vulnerable, she was unsure what to do, and 
ended up forgoing her fellowship, unwilling 
to put herself under his control for a second 
field season. But she received an alternative 
source of funding and continued her field 
work in the area — which led to more 
frightening encounters with him. 

Smith wrote down every detail: dates, 


responsible for her graduate students. “We 
need to create a culture where incidents are 
rare and reporting is easy,’ she says. 


CULTURAL SHIFT 

The SAFE study is already starting to drive 
change. “While there have been anecdotes and 
whispers about harassment at field sites, scien- 
tists are trained to seek evidence in a methodi- 
cal, quantitative way to confirm the presence 
of a problem,’ says Kate Clancy, a co-author of 
the SAFE paper and an anthropologist at the 
University of Illinois at Urbana-Champaign. 
“We gave them the data.” 

And SEAC past president Tristram Kidder, 
an anthropologist at Washington University 
in St Louis, Missouri, is helping to craft clear 
guidelines on professional field conduct and 
expectations as well as on detailed harass- 
ment-reporting procedures. They will be 
published this year. Other organizations in 
Europe and elsewhere are conducting disci- 
pline-based surveys in biology, astronomy, 
ecology and anthropology. 

Some organizations, among them the 
American Geophysical Union, have already 
created a policy. The Association of American 
Geographers will draft guidelines for prevent- 
ing and reporting harassment at its meeting 
in April, and the American Anthropological 
Association last year issued a ‘zero tolerance’ 
stance on sexual harassment and is launching 
an initiative to help members prevent it or deal 
with it when it happens. 

Some groups are raising awareness through 
seminars. The online Earth Science Women’s 
Network, an international peer-mentoring 


times and how the encounters made her 
feel. After her second field season, she 
took those records, along with every e-mail 
he had sent, to the ombudsman’s office 
at her university. After she reported the 
harassment to the university’s human- 
resources department, the perpetrator 
threatened to sue her for defamation. He 
ultimately lost his job, but later secured a 
post elsewhere, and Smith learned that he 
had continued to harass women. 
“Unfortunately, speaking out is not 
always good for one’s career, but it 
was worth the risk for me,” she says. 
Now an assistant professor at a major 
university, Smith makes sure that her 
graduate students are prepared for 
safe, productive field experiences and 
know how to get help should they need 
it. That includes contacting her or the 
university ombudsman’s office if they have 
intimidating encounters. V.6. 


association, last autumn gave a presentation 
on field safety at the University of Wisconsin, 
Madison. “We talked about setting boundaries 
and expectations — about everything from 
living arrangements to working hours — 
before going into the field, says Erika Marin- 
Spiotta, a geographer at the university. 

Others are working to change the culture 
of tacit acceptance nearer to home. Anthro- 
pologist Bob Muckle at Capilano University in 
Vancouver, Canada, says that he was stunned 
by the SAFE results. “I thought the stuff I had 
seen happen to female colleagues in the 1970s 
and 1980s had disappeared,” he says. He has 
instituted a zero-tolerance policy on sexual 
harassment for the summer field school he 
directs, and gives students handouts that 
define harassment and provide contacts and 
phone numbers for reporting any such event. 

Still, it will take more than lone actions or 
a few guidelines to effect a true cultural shift, 
say those who study the problem. Real change 
will come when the international scientific 
community decides, top-down and bottom- 
up, what constitutes acceptable behaviour. 
“Few things are simply a women's issue; this 
is acommunity issue,’ says SAFE co-author 
Julienne Rutherford, a biological anthropolo- 
gist at the University of Illinois at Chicago. 
“Senior people in the hierarchy are more likely 
to be perpetrators. They are also the people 
who have the power to establish appropriate 
behaviour and what is acceptable in our work 
culture.” = 


Virginia Gewin is a freelance writer in 
Portland, Oregon. 
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SOFTWARE 
Career detective 


Software that can track researchers’ career 
progress is under development. It will 
automate the collection of data required 
to learn how and where young scientists 
get jobs. A team used data collected by 
the tool and by manual analysis to show 
that higher research output correlates 
with scientists’ ability to move voluntarily 
between posts (A. Geuna et al. Res. 

Policy http://doi.org/2hz; 2015). Using 
researchers names, the tool can mine 
web pages and CVs to identify affiliations 
and research productivity. The software 
could be used to reconstruct the career 
paths of researchers and to assess which 
factors are correlated with staying in 
academic positions or moving to another 
sector, says lead author Aldo Geuna, an 
economist at the University of Turin in 
Italy. The tool is openly available, he says, 
and developers and users are working to 
improve its algorithms. 


EMPLOYMENT 
Job dissatisfaction lasts 


Women who dislike their job come to hate 
it more over time, even if they earn more, 
whereas men’s job dissatisfaction stays 
much the same regardless of pay, according 
to a UK survey of 2,800 employees, which 
included scientists. Conversely, women 
and men who like their job enjoy it more 
as time passes. Kausik Chaudhuri at the 
University of Leeds, UK, aco-author of the 
study — called ‘Job Satisfaction, Age and 
Tenure’ — says the findings suggest that 

it does not become easier to adapt to a job 
that is not a good fit from the outset. Early- 
career researchers should therefore choose 
carefully in today’s economic climate. 


US RESEARCH FUNDING 
Call to smooth bumps 


Biomedical research advocates in the 
United States are calling for policy changes 
to ease boom-and-bust research-support 
cycles at the US National Institutes of 
Health (NIH). In a joint report, United for 
Medical Research, a research advocacy 
group, and the Information Technology & 
Innovation Foundation, a think tank in 
Washington DC, outline strategies to 
make the NIH budget more certain 

from year to year. These include 
apportioning federal funds for several 
years at a time and stipulating that any 
unspent funds can be rolled over to the 
next fiscal year. 
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Ua SCIENCE FICTION 


BY S.B. DIVYA 


darkened room, tucked next to 

the sofa, the Egg rested on its 
pedestal like a modern sculpture. Its 
quiet hum was the only sound in the 
apartment; its green indicator the 
only light. The screen on the front 
of the ovoid was dark, not revealing 
the partially formed creature incu- 
bating within. 

That wasn't right. The screen had 
never once been off, not while she 
had been here. She was gone now. 
She had slipped away quietly, with- 
out fuss, much as shed lived. 

“Promise, she had demanded, 
her voice raspy, as the smells of 
disinfectant and rot permeated his 
pores. “Promise that you'll keep it 
going” 

“T promise,” he'd lied. “Don’t 
worry.’ He clutched the pills in his 
pocket with one hand. 

In the end, she had been reduced 
to skin and bones. Her hand, clasp- 
ing his, was a papery claw. She had 
always been scrawny. He’ called 
her chicken legs when they first met, 
and shed retorted with “stupid head”. 
Insults had never been her strong 
point. They were six years old. Love came 
years later, and the cancer not long after that. 

She was cured the first time. A designer 
molecule flooded her system, keeping the 
traitorous cells at bay. 

“Let's have a baby,’ she said when hope was 
allowed back into their house. 

“Let’s have two,’ he responded, and they 
both grinned like fools and got started. 

They found out not long afterwards that 
the molecule that kept her alive was poison 
to any fetus. They spent the remainder of his 
inheritance on the Egg — and the hormones 
and extractions and fertilizations. 

“Tt will be every bit your baby,’ promised 
the specialists. 

She let them record her heartbeat and 
intestinal sounds for playback. The two of 
them used the microphone daily to stimulate 
budding ear drums. She sang her favourite 
songs in her off-key shower voice. He played 
his guitar and read cooking magazines 
aloud. They stared at the screen in fascina- 
tion, watching it transform from a tadpole to 
an alien. The sofa seat nearest the Egg turned 
into a sinkhole. 


E the corner of the night- 
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THE EGG 


All that remains. 


The second cancer snuck in, quiet and 
efficient, while they were busy looking the 
other way. She needed another designer mol- 
ecule, but she was too far down the queue. 
The money that would have bought her way 
higher was gone, so the doctors tried the old 
fashioned poisons. She lost her strength, the 
contents of her stomach and every hair on 
her body, but she didn’t miss a day singing 
to the Egg. 

Watching her reclining against the cylin- 
drical pedestal, forehead resting on the 
warm ovoid above, he loved her even more. 

“Youre beautiful, he said. 

She grinned, all teeth in a skeletal face. 
“You've never lied to me before” 

“And I'm not lying now.” 

The second cancer took her swiftly. The 
apartment looked just as it had when they'd 
left for the hospital two days ago, but noth- 
ing was the same. The faint glow of city 

lights bled around 


> NATURE.COM the curtain edges, 
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beckoning him. He shuffled towards 
it slowly like an old man and tripped 
on the edge of the rug — the rug that 
they'd chosen together to cushion 
tender baby feet and dimpled knees. 

With a trembling hand he 
reached out and turned on the 
screen. It almost looked human 
now, although the head was too 
large and the body too skinny, sort 
of like she had looked in those last 
days of life. His hand moved of its 
own accord, navigating the menu 
screens, delving deep to find that 
buried option that came with every 
Egg. His fingers hovered over the 
number pad. 

“Tm sorry, little one,’ he whis- 
pered. “This isn't how the road was 
supposed to go. I wish — if only —” 
He sighed. “I can’t do this alone, 
and there’s no one left for you but 
me, a poor excuse for a father” He 
drew his hand back. “Wait. Let’s go 
together. I can do that much for you.” 

He stood and walked to the 
kitchen. His steps felt lighter now 
that the decision was made. He 
filled a glass with water, just enough 
to swallow a few pills. As he walked 
the scant distance back to the Egg, 
he reached into his pocket and 
retrieved the tablets. Their small white forms 
gleamed like pearls in his palm. 

He reclined against the Egg, as she had, 
and closed his eyes. You've never lied to me 
before. Her words rattled like marbles in his 
skull. An involuntary tear traced its way 
down the contours of his face. It was the pin- 
hole in the dam, and he felt all his grief push 
against it and then break through. 

The sobs crashed over him in great waves, 
and he wrapped his arms around the warm 
Egg, clinging to it like a buoy in a storm. The 
glass and pills fell from his hands, forgotten 
in the tempest. An eternity passed before he 
went limp from exhaustion and fell asleep, his 
body curled around the Egg’s pedestal. The 
menu system quietly and automatically exited 
to the start, and the screen went black. = 
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